Ieee Micro

download Ieee Micro

of 10

Transcript of Ieee Micro

  • 8/6/2019 Ieee Micro

    1/10

    60

    In modern IP routers, Internet Pro-tocol (IP) lookup forms a bottleneck in pack-

    et forwarding because the lookup speedcannot catch up with the increase in linkbandwidth. Ternary content-addressablememories (TCAMs) have emerged as viabledevices for designing high-throughput for-warding engines on routers. Called ternarybecause they store dont-care states in additionto 0s and 1s, TCAMs search the data (IPaddress) in a single clock cycle. Because of thisproperty, TCAMs are particularly attractivefor packet forwarding and classifications.Despite these advantages, large TCAM arrayshave high power consumption and lack scal-able design schemes, which limit their use.

    Todays high-density TCAMs consume 12to 15 W per chip when the entire memory isenabled. To support the superlinearly increas-ing number of IP prefixes in core routers, ven-dors use up to eight TCAM chips. Filtering andpacket classification would also require addi-tional chips. The high power consumption ofusing many chips increases cooling costs and

    also limits the router design to fewer ports.1

    Recently, researchers have proposed a few

    approaches to reducing power consumptionin TCAMs,1,2 including routing-table com-paction.3,4 Liu presents a novel technique toeliminate redundancies in the routing table.3

    However, this technique takes excessive timefor update because it is based on the Espres-so-II minimization algorithm,5which expo-nentially increases in complexity with thenumber of prefixes in a routing table. Thus,our works main objective is a TCAM-basedrouter architecture that consumes less powerand is suitable for the incremental updatingthat modern IP routers need. Additionally, theapproach we will describe minimizes thememory size required for storing the prefixes.

    We propose a two-level pipelined architec-ture that reduces power consumption throughmemory compaction and the selective enable-ment of only a portion of the TCAM array. Wealso introduce the idea of prefix aggregationand prefix expansion to reduce the number ofrouting-table entries in TCAMs for IP lookup.

    Ravikumar V.C.

    Rabi N. Mahapatra

    Texas A&M University

    TERNARY CONTENT-ADDRESSABLE MEMORY (TCAM) IS A POPULAR

    HARDWARE DEVICE FOR FAST ROUTING LOOKUP. THOUGH USEFUL, TCAM

    ARRAYS HAVE HIGH POWER CONSUMPTION AND HEAT DISSIPATION,

    PROBLEMS ALLEVIATED BY REDUCING THE NUMBER OF ROUTING-TABLE

    ENTRIES. THE AUTHORS PRESENT AN APPROACH THAT EXPLOITS TWO

    PROPERTIES OF PREFIXES TO COMPACT THE ROUTING TABLE.

    TCAM ARCHITECTURE FORIP LOOKUP USING

    PREFIX PROPERTIES

    Published by the IEEE Computer Society 0272-1732/04/$20.00 2004 IEEE

  • 8/6/2019 Ieee Micro

    2/10

    Today, TCAM vendors provide mechanismsto enable a chunk of TCAM much smaller thanthe entire TCAM array. Exploiting this tech-nology reduces power consumption duringlookup in the TCAM-based router. We also

    discuss an efficient incremental update schemefor the routing of prefixes and provide empir-ical equations for estimating memory require-ments and proportional power consumptionfor the proposed architecture. Finally, we usetraces from bbnplanet, attcanada,utah, andis routers to validate the proposedTCAM architectures effectiveness.

    Background and related workWe can categorize approaches to perform-

    ing IP lookup as either software or hardwarebased. Researchers have proposed various soft-

    ware solutionsfor routing lookup.6,7

    Howev-er, these software approaches are too slow foruse in complex packet processing at everyrouter, mainly because they require too manymemory accesses: Independent of their designor implementation techniques, softwareapproaches take at least four to six memoryaccesses for a single lookup operation. Todayspacket processing requires speeds of about 40Gbps; software approaches achieve no morethan 10 Gbps.8

    Hardware approaches typically use dedi-cated hardware for routing lookup.9,10 More

    popular techniques use commercially avail-able content-addressable memory (CAM).CAM storage architectures have gained inpopularity because their search time is O(1)that is, it is bounded by a single memoryaccess. Binary CAMs allow only fixed-lengthcomparisons and are therefore unsuitable forlongest-prefix matching. The TCAM solvesthe longest-prefix problem and is by far is thefastest hardware device for routing. In con-trast to TCAMs, ASICs that use triesdigi-tal trees for storing strings (in this case, theprefixes)require four to six memory access-es for a single route lookup and thus havehigher latencies.1Also, TCAM-based routing-table updates have been faster than their trie-based counterparts.

    The number of routing-table entries isincreasing superlinearly.8 Today, routing tableshave approximately 125,000 entries and willcontain about 500,000 entries by 2005, so theneed for optimal storage is also very impor-

    tant. Yet CAM vendors claim to handle a max-imum of only 8,000 to 128,000 prefixes, tak-ing allocators and deallocators into account.11

    The gap between the projected numbers ofrouting-table entries and the low capacity ofcommercial products has given rise to workon optimizing the TCAM storage space byusing the properties of lookup tables.3, 4

    However, even though TCAMs can storelarge numbers of prefixes, they consume largeamounts of power, which limits their useful-ness. Recently, Panigrahy and Sharma intro-duced a paged-TCAM architecture to reducepower consumption in TCAM routers.2 Theirscheme partitions prefixes into eight groupsof equal size; each group resides on a separate

    TCAM chip. A lookup operation can thenselect and enable only one of the eight chipsto find a match for an incoming IP address.In addition, the approach introduces a pag-ing scheme to enable only a set of pages with-in a TCAM. However, this approach achievesonly marginal power savings at the cost ofadditional memory and lookup delay.

    Other work describes two architectures, bitselection and trie-based, which use a pagingscheme as the basis for a power-efficientTCAM.1 The bit selection scheme extracts the16 most significant bits of the IP address anduses a hash function to enable the lookup ofa page in the TCAM chip. The approachassumes the prefix length to be from 16 to 24bits. Prefixes outside this range receive specialhandling; the lookup searches for them sepa-rately. However, the number of such prefixesin todays routers is very large (65,068 for thebbnplanet router), and so this approachwill result in significant power consumption.

    61MARCHAPRIL 2004

    Today, routing tables have

    approximately 125,000 entries

    and will contain about500,000 entries by 2005,

    making optimal storage

    important.

  • 8/6/2019 Ieee Micro

    3/10

    In addition, the partitioning scheme cre-ates a trie structure for the routing table pre-fixes and then traverses the trie to createpartitions by grouping prefixes having thesame subprefix. The subprefixes go into an

    index TCAM, which further indexes into sta-tic RAM to identify and enable the page inthe data TCAM that stores the prefixes. Theindex TCAM is quite large for smaller pagesizes and is a key factor in power consump-tion. The three-level architecture, thoughpipelined, introduces considerable delay.

    It is important to note that both theapproaches1,2 store the entire routing table,which is unnecessary overhead in terms ofmemory and power. Although the existingapproaches reduce power either by compact-ing the routing table or selecting a portion of

    the TCAM, our approach reduces power bycombining the two approaches and exploit-ing the observable properties of prefixes.

    Prefix properties for compactionThe two purposes for delving into the pre-

    fix properties are

    to further compact the routing tablebeyond the compaction provided byexisting techniques, and

    to derive an upper bound on minimiza-tion time to prevent it from becoming a

    bottleneck in the routing lookup.

    Previous attempts to reduce the routing-table size using prefix properties used prefixoverlapping and minimization techniques.3

    Prefix overlappingachieves compaction by stor-ing only one of many overlapping prefixes thathave the same next hop. Two prefixes overlapif one is an enclosure of the other. Let Pibe aprefix, with |Pi| denoting the length of prefixPi. Then Pi Pis called an enclosure ofPj, ifPjP,j iand |Pi| < |Pj|, such that Piis a sub-prefix ofPj. IfPiandPjhave the same next hop,then Pican represent them. Thus, set of over-lapping prefixes {P1, P2, P3, Pn}, such thatany |P1| < |P2| < |P3| < |Pn| having the samenext hop is replaceable with single entry P1. Ifan update deletes entryP1, entryP2 shouldrepresent set of overlapping prefixes {P2, Pn}. When an update adds new entryPisuchthat |Pi| |P1|, then the routingtable should not change, except for an updateof the set of overlapping prefixes.

    Prefix minimizationlogically reduces two ormore prefixes to a minimal set, if these prefix-

    es have the same next hop. The logic mini-mization is an NP-complete problem, solvableusing the Espresso-II algorithm. Espresso-IIcan minimize prefixes {P1, P2, P3, Pn} to{P1, P2, P3, Pm}, such that m n.

    Although the prefix overlapping and mini-mization techniques together compact about30 to 45 percent of the routing table, thesetechniques have an overhead when it comes toprefixes that require fast updates. The timetaken for prefix overlapping is bounded andindependent of the routers size. However, thelogic minimization algorithm using Espresso-

    II has a runtime that increases exponentiallywith input size. Based on Lius techniques,3 theEspresso-II input can be as high as 15,146 pre-fixes for the attcanada router. Thus, theruntime for such a large amount of input datacan be several minutes and is very expensivefor incremental updates. We propose tacklingthis problem by establishing an upper bound,independent of router size, on the input to theminimization algorithm. We introduce anoth-er property, called prefix aggregation, to helpus fix the upper bound. Based on this proper-ty, prefix minimization is time bounded.

    Prefix aggregationFigure 1 presents a portion of the routing-

    table traces from the bbnplanet router. Itrepresents all the prefixes in the routing table,starting with 129.66 and having prefixlength l, for 16 < l 24. The number 129.66is the largest common subprefix (LCS) forthe set of those prefixes. The LCS for anyprefix is the subprefix with a length nearestto the multiple of eight, such that |Si| < |Pi|,where S i is the LCS of prefix Pior LCS(Pi).The prefixes having a different LCS are leastlikely to have the same next hop. Accordingto this observation, applying minimizationto a set of prefixes having the same LCSshould achieve a compaction nearly equal tothat achieved by applying minimization tothe entire lookup table.

    So without sacrificing much compaction,the use of prefix aggregation can provide anupper bound on the number of possible pre-

    62

    TCAM ARCHITECTURE

    IEEE MICRO

  • 8/6/2019 Ieee Micro

    4/10

    fixes for input to the minimization algorithm.This input set is the largest set of prefixes suchthat each prefix in the set has the same LCS.These prefixes, which have LCS value S, canhave prefix lengths from |S| to |S| + 8. This rangecontains a maximum of 512 prefixes; however,all these combinations are not possible. Forexample, the assignment 120.255.0.0/12 pre-cludes the combinations 120.255.0.0/13 through120.255.0.0/16. So to calculate the maximumpossible number of prefixes having the same LCS,

    we assign all combinations of 120.x.0.0/16prefixes so that no assignments are possible forprefixes of length less than 16. Thus, a maxi-mum of 256 prefixes can share the same LCS.

    Prefix expansionResearchers have presented the property of

    prefix expansion for IP lookup using softwareapproaches.12We adopt this property to fur-ther compact the routing table and can repre-sent the prefix expansion property as follows.IfPirepresents a prefix, such that |Pi| is not amultiple of eight, then the prefix expansionproperty expands Pi to PiXm, where X is adont-care and m= 8 (|Pi| mod 8). The oper-ator represents the concatenation opera-tion. For example, the prefix 100000010100expands to 100000010100XXXX using theprefix expansion property.

    The Espresso-II algorithm will providemore compaction if all Piare of same length.However, all the prefixes corresponding to the

    same LCS are not necessarily the same length.Hence, to increase the compaction, weexpand prefixes to a size that is the nearest

    multiple of eight.Just as the number of inputs governs Espres-

    so-IIs runtime, so does the inputs bit lengths.We have already limited the input set to the setof prefixes with the same LCS. We nextobserve that, in terms of Espresso-IIs calcu-lations, the LCS is redundant. So it would beuseful to further compact the prefixes by elim-inating the LCS from each prefix and usingthe remainder, which is the least-significant 8bits after prefix expansion.

    Routing-table compaction

    During router initialization, our approachcompacts the routing table using the prop-erty of prefix overlapping and also removesredundant entries (prefix minimization). Wethen use the property of prefix aggregationto form sets of prefixes based on their LCSvalues. Each of these sets goes through pre-fix expansion before serving as input toEspresso-II.

    Prefix overlapping. Prefix overlapping appliesto all the prefix entries in the routing tablewith the same next hop. The routing tablethen stores only one of the many overlappingprefixes. Figure 2 is the result of applying pre-fix overlapping to the trace in Figure 1. In thiscase, prefix overlapping reduces the table byseven entries.

    Prefix minimization and prefix expansion.Figure 3 shows that prefix minimizationand prefix expansion reduce the number of

    63MARCHAPRIL 2004

    Prefix Next hop

    129.66.6.0/24 4.0.6.142

    129.66.8.0/24 4.0.6.142

    129.66.12.0/24 4.0.6.142

    129.66.20.0/24 4.0.6.142

    129.66.21.0/24 4.0.6.142

    129.66.30.0/23 4.0.6.142

    129.66.31.0/24 4.0.6.142129.66.32.0/19 4.0.6.142

    129.66.34.0/24 4.0.6.142

    129.66.47.0/24 4.0.6.142

    129.66.48.0/24 4.0.6.142

    129.66.64.0/18 4.0.6.142

    129.66.88.0/24 4.0.6.142

    129.66.95.0/24 4.0.6.142

    129.66.111.0/24 4.0.6.142

    129.66.128.0/22 4.0.6.142

    129.66.132.0/24 4.0.6.142

    129.66.172.0/24 4.0.6.142

    Figure 1. Sample trace of routing table from

    bbnplanet.

    Prefix Next hop

    100000010100001000000110XXXXXXXX 4.0.6.142

    100000010100001000001000XXXXXXXX 4.0.6.142

    100000010100001000001100XXXXXXXX 4.0.6.142

    100000010100001000010100XXXXXXXX 4.0.6.142

    100000010100001000010101XXXXXXXX 4.0.6.142

    10000001010000100001111XXXXXXXXX 4.0.6.142

    1000000101000010001XXXXXXXXXXXXX 4.0.6.142100000010100001001XXXXXXXXXXXXXX 4.0.6.142

    1000000101000010100000XXXXXXXXXX 4.0.6.142

    100000010100001010000100XXXXXXXX 4.0.6.142

    100000010100001010101100XXXXXXXX 4.0.6.142

    Figure 2. Result of applying prefix overlapping to Figure 1

    trace.

  • 8/6/2019 Ieee Micro

    5/10

    prefixes by nine. Figure 4 shows that prefixminimization without prefix expansionreduces the number of prefixes by eight.

    Proposed architectureThe prefix properties reduce the IP lookup

    tables length (vertically) as described earlier.Here, we propose an architectural techniquethat reduces the IP lookup table laterally. Thistechnique adopts the two-level routing lookuparchitecture in Figure 5.

    Level 1 consists of a decoder and a recon-figurable page-enable block (PEB). The PEBis a combinational circuit consisting of a dri-ver network to drive the output port, whichconnects directly to TCAM pages throughpage-enable lines. The size of the PEBs out-put port (in bits) equals the number of pagesin the TCAM. The PEB is also reconfigurableto accommodate changes during a page over-flow. Level 2 is a 256-segment TCAM array.In addition to the assigned pages, the TCAMarray contains many empty pages; an imple-mentation will use these pages for memorymanagement.

    For each incoming IP address, the first octet

    (A) goes to the decoder-PEB, which enablesonly one segment in the TCAM array. Thedecoder-PEB then sends page-enable signals toeach TCAM page. The TCAM then comparesthe enabled pages with the next three incoming

    octets (B, C, and D) of the IP address.The overhead of the decoder-PEB hardwareis insignificant compared to that of theTCAM array. Avoiding the storage of the firstoctet in the TCAM array also reduces the totalTCAM storage space by 25 percent. This sav-ings is significant, especially when routingtables are as large as 125,000 entries. Also, thearchitecture enables only one segment of theTCAM array at any time, significantly reduc-ing power consumption.

    The lookup latency is the TCAM readaccess time plus small gate delays because of

    the decoder-PEB hardware. But gate delaysshould be less than the memory access time,so the TCAMs access time would limit lookupthroughput. To increase the lookup through-put, the architecture could select more thanone segment in the TCAM and simultane-ously perform multiple IP lookups. This ispossible by duplicating the decoder-PEB logicin level 1 and resolving additional IP prefixesin the TCAM array. Similarly, concurrentlookup and update are also feasible using sucharchitecture. However, the more TCAM seg-ments the lookup enables, the more power the

    TCAM consumes, so a designer must trade offpower consumption and throughput.

    Empirical model for memory and powerWe now present empirical formulas to com-

    pute the total memory requirement andpower consumption of our proposed archi-tecture. Let rowsi represent the number ofentries in the ith TCAM segment; and tagbits,the TCAM word length (24 bits). The pro-posed architectures minimum memoryrequirement (for 32-bit entries) is

    (1)

    We base our empirical model for power esti-mation on the number of entries enabled dur-ing the lookup process. Based on this model,the power consumption (for 32-bit entries) ina TCAM with its ith segment enabled is

    ( / )rows tagbitsii

    =

    321

    256

    64

    TCAM ARCHITECTURE

    IEEE MICRO

    Prefix entries Next hop

    100000010100001010101100XXXXXXXX 4.0.6.142

    100000010100001000011111XXXXXXXX 4.0.6.142

    100000010100001000001X00XXXXXXXX 4.0.6.142

    10000001010000100001010XXXXXXXXX 4.0.6.142

    100000010100001010000100XXXXXXXX 4.0.6.142

    100000010100001000000110XXXXXXXX 4.0.6.142

    10000001010000100001111XXXXXXXXX 4.0.6.142

    1000000101000010100000XXXXXXXXXX 4.0.6.142

    1000000101000010001XXXXXXXXXXXXX 4.0.6.142

    100000010100001001XXXXXXXXXXXXXX 4.0.6.142

    Figure 4. Result of applying prefix minimization without prefix expansion to

    Figure 1 trace.

    Prefix entries Next hop

    1000000101000010X0101100XXXXXXXX 4.0.6.142

    10000001010000100XX1010XXXXXXXXX 4.0.6.142

    10000001010000100XX01X00XXXXXXXX 4.0.6.142

    100000010100001010000X00XXXXXXXX 4.0.6.142

    10000001010000100XX00110XXXXXXXX 4.0.6.142

    10000001010000100XX1111XXXXXXXXX 4.0.6.142

    1000000101000010100000XXXXXXXXXX 4.0.6.14210000001010000100X1XXXXXXXXXXXXX 4.0.6.142

    100000010100001001XXXXXXXXXXXXXX 4.0.6.142

    Figure 3. Result of applying prefix minimization and prefix expansion to

    Figure 1 trace.

  • 8/6/2019 Ieee Micro

    6/10

    P(rows,tagbits) = k (rowsi tagbits/32) (2)

    where k is the power constant.The largest segment in the TCAM array

    determines the maximum power consump-tion for the lookup process.

    Fast incremental updateApproximately 100 to 1,000 updates per

    second take place in core routers today.13

    Thus, the update operation should be incre-mental and fast to avoid becoming a bottle-neck in the search operation.

    A router receives informationroutingupdatesabout the route changes from itsnearby routers. During each routing update(an insertion or deletion), we apply com-paction techniques to avoid any redundant

    storage in the TCAM. However, it is possiblethat more than one prefix in the TCAM willneed updating, a situation called TCAMupdate.14 In Lius technique, for a routingupdate, compaction would require the mini-mization of about 10,000 prefixes at a time.Minimizing such a large set of prefixes intro-duces two types of delays: The first delaycomes from Espresso-IIs computation time.The second delay comes from TCAM entryupdates. IfMis the size of the prefix set to beminimized, then it takes O(M) time for aTCAM insert operation. This is because theTCAM must scan the minimized set of pre-fixes and update only those prefixes that havechanged. Deleting an entry from the TCAMis even costlier, taking O(M2) time. In ourtechnique, because the maximum value ofM

    65MARCHAPRIL 2004

    Figure 5. TCAM architecture for routing lookup.

    B, C, and D

    Level 2Level 1

    ReconfigurablePEB

    Decoder

    256

    PE1 = 0

    PE2 = 0

    PE3 = 0

    PEi= 0

    PEi+1 = 0

    PEi+n = 0

    24

    Segment

    Page

    Enabledsegment

  • 8/6/2019 Ieee Micro

    7/10

    is 256 (rather than 10,000), the insertion anddeletion steps are fast and truly incremental.

    For any routing update, our approachrestricts TCAM updates to a segment. So it ispossible to update several segments simulta-neously by modifying the decoder-PEB unit.

    Our technique can achieve a higher numberof updates per second than what a singleTCAM chip supports by using several TCAMchips and placing each on a separate bus.

    In our architecture, it is possible for the seg-ment size to grow during update, possiblyleading to an overflow. Rebuilding the entirerouting table would take care of the overflow.However, a rebuild is time consuming, sounsuitable for fast lookup. Providing emptypages in the TCAM array will help managethe overflow. Reconfiguring the PEB on thefly will permit an empty page to become part

    of the overflowing segment. This reconfigu-ration takes microseconds.

    Experimental setup and resultsWe simulated our approach using a chipset

    containing two 1.6-GHz AMD Athlons tocompute the prefix minimizations. The prefixtraces for minimization came from four

    routers (is, utah, bbnplanet, andattcanada), and we evaluated memorycompaction, minimization time, and worst-case power consumption. For memory com-paction and minimization time, we compared

    our approach to Lius.3

    We also compare ourworst-case power consumption to the resultsof Liu;3 Zane, Narlikar, and Basu;1 and Pani-grahy and Sharma.2

    Memory compactionTable 1 indicates the compaction achieved

    by prefix overlapping. For example, the caseofattcanada shows a 29-percent com-paction.

    Table 2 shows the compaction from ourprefix minimization approach, and com-pares it to Lius results. The second column

    shows the number of prefixes remaining ifwe were to minimize as many prefixes as pos-sible in the given routing table. Ourapproach and Lius group the prefixes beforeminimization; columns three and four showthe number of prefixes remaining after min-imization, for each approach. Although nei-ther approach eliminates all the prefixespossible, each significantly compacts therouting tables.

    However, for large routing tables, we couldnot determine the prefix minimization of Liusapproach, even after running the simulation

    for two days. Also, our approach compacts thesmaller routing tables ofis and utahmorethan Lius approach, because our approachuses prefix expansion. In fact, our approachachieves more than 33 percent compactionfor the attcanada, bbnplanet, andisrouters, as column five of Table 2 shows.

    We also determined the compaction of both

    66

    TCAM ARCHITECTURE

    IEEE MICRO

    Table 1. Compaction using prefix overlapping.

    No. of prefixes

    Before After Compaction

    Router compaction compaction (percentage)

    is 801 648 19.1

    utah 145 133 8.3

    attcanada 112,412 79,769 29.0

    bbnplanet 124,538 92,774 25.5

    Table 2. Compaction using prefix minimization.

    No. of prefixes

    After minimizing After prefix After prefix Compaction of

    all possible minimization minimization our approach

    Router prefixes (our approach) (Lius approach) (percentage)

    is 434 505 562 36.9

    utah 123 123 132 15.2

    attcanada Hangs* 70,368 Hangs* 37.4

    bbnplanet Hangs* 83,468 Hangs* 33.0

    * The minimization processes do not complete within two days.

  • 8/6/2019 Ieee Micro

    8/10

    approaches when they include prefix overlap-ping followed by prefix minimization. Col-umn 2 of Table 3 shows the number of entriesremaining after our approach minimizes everyprefix possible; we compute this number usingequation 1. Column 3 shows the results thatLiu reported earlier. Column 4 indicates ourapproachs total routing table compaction, andcolumn 5 shows the percentage improvement

    in compaction over Lius results. Ourapproach offers significant improvementsbecause it exploits the compaction opportu-nities inherent in our architecture and the pre-fix properties.

    Minimization timeTable 4 shows the runtime for minimizing

    the prefixes using Espresso-II in bothapproaches. The results indicate the maxi-mum possible time that the 1.6-GHz Athlon,dual-processor chipset takes to minimize theprefixes when the neighboring router requeststhe addition of a new prefix. The worst-caseminimization time for Lius approach was1,098.47 seconds for the attcanada router(15,146 prefixes), which is very expensive forincremental updates. Our approach, on theother hand, took a worst-case time of 0.006seconds for bbnplanet. This value is notonly small and practical for real-life updates,but also bounded because at any point of time

    the number of inputs to Espresso-II will neverexceed 256. Espresso-IIs runtime also increas-es with the number of bits in the prefixesunder consideration. Our approach uses eightbits as opposed to the 32 bits for Liusapproach, further reducing runtime.

    Worst-case power consumptionTCAM power consumption is proportion-

    al to the number of bits enabled in the TCAMduring the search operation.1 Table 5 shows theworst-case power consumption (32-bit equiv-alent) for several approaches. Lius approachenables the entire TCAM (Nprefixes) duringevery lookup operation, making it very costly.Panigrahy and Sharmas approach enables alarge number of entries (16,000)2 and 8,000group IDs. In the bitmap technique, althoughthe number of entries enabled during search(Cmax) is very small, the entries with prefixes ofless than 16 and greater than 24 bits are alwaysenabled during every lookup operation. Thenumber of these entries, Pb, is very large(65,068) for thebbnplanetrouter. The par-tition technique uses an index TCAM and adata TCAM. It enables only a page of size N/bin the data TCAM, where bis the number ofpages in the TCAM. For every lookup, how-ever, this technique searches the entire indexTCAM of size Sd, which is large if the page sizeis small. When the page size is large, the index

    67MARCHAPRIL 2004

    Table 3. Compaction using prefix minimization and overlapping.

    No. of prefixes

    Our Lius Our approachs Improvement

    Router approach approach compaction (percentage) (percentage)

    is 348 460 56.6 24.7

    utah 87 123 40.0 62.1

    attcanada 43,378 54,476 61.4 15.2

    bbnplanet 53,625 69,646 56.9 22.6

    Table 4. Timing analysis for minimization.

    Our approach Lius approach

    Time No. of prefixes Time No. of prefixes

    Router (seconds) updated (seconds) updated

    is 0.003 34 0.159 373

    utah 0.002 8 0.008 44

    attcanada 0.005 143 1,098.47 15,146bbnplanet 0.006 171 63.04 7,580

  • 8/6/2019 Ieee Micro

    9/10

    TCAM has few entries; however, it is the size ofthe page enabled in the data TCAM that makesthe main contribution to power consumption.Our approach enables only a segment (Si) ofthe TCAM, which is quite small compared tothe entire TCAM. For example, in the bbn-planet router, the enabled segment hasabout 5,600 prefixes. Because most segmentsin our TCAM are very small, the average power

    consumption is quite small. It might be impor-tant to mention that the word length of theTCAM in our architecture is 24 bits as opposedto the 32 bits in conventional TCAMs.

    Table 6 shows the worst-case power con-sumption per lookup in terms of the numberof bits enabled; we calculate this value usingequation 2. Our approach saves significantpower compared to approaches that enableentire TCAMs.

    T

    he proposed architecture enables only asmall portion of TCAM array during the

    lookup operation and achieves significantpower savings. Results indicate that such anarchitecture holds promise in supportingstorage- and energy-efficient router designs.We plan to further reduce the update time toincrease the throughput of incrementalupdates using a new approach on minimiza-tion (other than Espresso-II) that will needreduced minimization time. MICRO

    References

    1. F. Zane, G. Narlikar, and A. Basu,

    CoolCAMs: Power-Efficient TCAMs for

    Forwarding Engines, Proc. IEEE Infocom

    2003, IEEE Press, 2003, pp. 42-52.

    2. R. Panigrahy and S. Sharma, Reducing

    TCAM Power Consumption and Increasing

    Throughput, Proc. 10th Symp. High-

    Performance Interconnects(HOTI 02), IEEE

    CS Press, 2002, p. 107-112.

    3. H. Liu, Routing Table Compaction in

    Ternary CAM, IEEE Micro, Jan.-Feb. 2002,

    vol. 22, no. 1, pp. 58-64.

    4. M.J. Akhbarizadeh and M. Nourani, An IP

    Packet Forwarding Technique Based on

    Partitioned Lookup Table, Proc. IEEE Intl

    Conf. Comm. (ICC 02), IEEE Press, 2002, pp.

    2263-2267.5. R.K. Brayton et al., Logic Minimization

    Algorithms for VLSI Synthesis, Kluwer

    Academic Publishers, 1984.

    6. S. Nilsson and G. Karlsson, IP-Address

    Lookup Using LC-Tries, IEEE J. Selected

    Areas Comm., vol. 17, no. 6, June 1999, pp.

    1083-1092.

    7. M. Waldvogel et al., Scalable High-Speed

    IP Routing Table Lookups, Computer

    Comm. Rev., vol. 27, no. 4, Oct. 1997, pp.

    25-36.

    8. P. Gupta, Algorithms for Routing Lookups

    and Packet Classification, doctoral

    dissertation, Dept. Computer Science,

    Stanford Univ., 2000.

    9. P. Gupta, S. Lin, and N. McKeown, Routing

    Lookups in Hardware at Memory Access

    Speeds, Proc. IEEE Infocom 1998, IEEE

    Press, 1998, pp. 1240-1247.

    10. M. Kobayashi, T. Murase, and A. Kuriyama,

    A Longest Prefix Match Search Engine for

    Multigigabit IP Processing, Proc. IEEE Intl

    Conf. Comm. (ICC 00), IEEE Press, 2000, pp.

    1360-1364.

    11. S. Sikka and G. Varghese, Memory-

    Efficient State Lookups with Fast Updates,

    Proc. ACM Special Interest Group on Data

    Comm. (SIGCOMM 00), ACM Press, 2000,

    pp. 335-347.

    12. V. Srinivasan and G. Varghese, Fast

    Address Lookups Using Controlled Prefix

    Expansion, ACM Trans. Computer

    Systems, vol. 17, no. 1, Oct. 1999, pp. 1-40.

    13. C. Labovitz, G.R. Malan, and F. Jahanian,

    68

    TCAM ARCHITECTURE

    IEEE MICRO

    Table 5. Worst-case bounds on the number of

    enabled entries.

    Worst-case no. of

    Approach enabled entries

    Liu 0(N)

    Panigrahy and Sharma 0(16,000 + p 6/32)

    Bitmap 0(Cmax + Pb)

    Partition 0(N/b+ Sd)

    Proposed 0(Si)

    Table 6. Relative power consumption per lookup.

    No. of Total no. of bits Savings

    Router enabled bits in the TCAM (percentage)

    is 205 24 460 32 67

    utah 17 24 123 32 90

    attcanada 4,582 24 544,476 32 99bbnplanet 5,596 24 69,646 32 94

  • 8/6/2019 Ieee Micro

    10/10

    Internet Routing Instability, IEEE/ACM

    Trans. on Networking, vol. 6, no. 5, Oct.

    1998, pp. 515-528.

    14. D. Shah and P. Gupta, Fast Updating

    Algorithms for TCAMs, IEEE Micro, vol. 21,

    no. 1, Jan.-Feb. 2001, pp. 36-47.

    Ravikumar V.C. is a system designer atHewlett-Packard. His research interests includenetworking, operating systems, and embeddedsystems. He has an MS degree in computerscience from Texas A&M University.

    Rabi N. Mahapatrais an associate professorwith the Department of Computer Science at

    Texas A&M University. His research interestsinclude embedded-system codesign, system onchips, VLSI design, and system architectures.Mahapatra has a PhD from the Indian Insti-tute of Technology, Kharagpur. He is a senior

    member of the IEEE Computer Society.

    Direct questions and comments to Rabi N.Mahapatra, Dept. of Computer Science,Texas A&M Univ., College Station, TX77845; [email protected].

    For further information on this or any other

    computing topic, visit our Digital Library at

    http://www.computer.org/publications/dlib.

    69MARCHAPRIL 2004

    WERE ONLINESubmit your manuscript to Micro on the Web!

    Our new Manuscript

    Central site lets you

    monitor your submission

    as it progresses through

    the review process. This

    new Web-based systems

    helps our editorial boardand reviewers track

    your manuscript.

    For more information, see us at http://cs-ieee.manuscriptcentral.com