67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router...

33
67886 - Switch and Router Design Dr. David Hay Ross 78b [email protected] Source: Nick Mckeown, Isaac Keslassy

Transcript of 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router...

Page 1: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

67886- SwitchandRouterDesign

[email protected]

Source: Nick Mckeown, Isaac Keslassy

Page 2: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

1 2

BottlenecksMemory,memory,…

3

Page 3: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

3

PacketProcessingExamples

• AddressLookup(IP/Ethernet)– Wheretosendanincomingpacket?“Useoutput-port3,tosendpacketstoMACaddress01:23:45:67:89:ab” – ExactMatch”Useoutput-port4,tosendpacketstodestinationnetwork111.15/16” - (LongestPrefixMatch)

• Firewall,ACL– Whichpackettoacceptordeny?”Dropallpacketsfromevilsourcenetwork66.66/16onports6-666”

– Usuallyneeds5fields:source-address,dest-address,source-port,dest-port,protocol

Page 4: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

4

PacketProcessingExamples

• IntrusionDetectionSchemes– DeepPacketinspection(DPI)”DropallpacketsthatcontainsthestringEvilWormanywhere withinthepacket”

– SNORTruleset

Page 5: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

PacketProcessingRate

12540Gb/s200331.2510Gb/s20017.812.5Gb/s19991.94622Mb/s1997

40B packets (Mpkt/s)

LineYear

1. Lookup mechanism must be simple and easy to implement2. (Surprise?) Memory access time is the long-term bottleneck

Page 6: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

MemoryTechnology(2003-04)

Technology Single chip density

$/chip ($/MByte)

Access speed

Watts/chip

Networking DRAM

64 MB $30-$50($0.50-$0.75)

40-80ns 0.5-2W

SRAM 4 MB $20-$30($5-$8)

4-8ns 1-3W

TCAM 1 MB $200-$250($200-$250)

4-8ns 15-30W

Note: Price, speed and power are manufacturer and market dependent.

Numbers are a bit outdated but give the general idea

Page 7: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

SimplestTask:ExactMatching

• Mostlyinbridges– Bridgesworksinlayer2(Ethernet)– BridgesconnectstwoEthernetnetworks– Wire-speedforwarding:

• Eachtimeapacketarrivesatabridge,forwarditaccordingtothedestinationMACaddress

• Store/updatealsothesourceMACaddress(learning)

• Shouldbedoneatwirespeed

Bridge

a b

c d

Page 8: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Solution1:BinarySearch

• MACaddresseshavevalueswhichcanbesorted• Thus,whenkeepingthemsorted,onecanperformabinarysearchonthearrayandfindtherightMACaddress

• However,eachiterationisamemoryaccessà logNmemoryaccessesà worksfine(evenusingDRAM)forsmallspeed,N(around10Mb/s,8Kvalues)butdoesn’tscaleforlargeN/higherspeeds(notevenfor100Mb/s,64Kvalues)

• Usingfasterhardware(SRAM)won’treallysolvetheproblem(anditismoreexpensive…)

Page 9: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

ScalingusingHashing

• Hashingismuchfasterthanbinarysearchonaverage,howevermuchslowerontheworstcase(uptolineartime…)

• However,onecanchoose(pre-compute)goodhashfunctions,sothenumberofcollisioncanbesmallandbounded– Precomputation takesalotoftime,butaddressesarenotaddedinrapid

rate– Applyingthehashfunctionsisdoneonwire-speed

• Moresophisticateddatastructure/hashingtechniquescanalsobeapplied(e.g.toreducememory)– BloomFilters,fingerprinting,etc.

Page 10: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Example(Gigaswitch,1994)

• N=64K;binarysearchtakes16memoryaccesses• Foreach48-bitaddressaddr,wefirstapplyh(addr),toget48-bitvalue:– 16LSBarethehash-tableentryindex(64Kentries)– Eachentryisabalancedbinarytreeofheightatmost3,sortedbytheremaining32MSB

– Thehashfunctionshouldguaranteethatnomorethan8addressesareinthesametree,andthatwecandisambiguatebetweenaddressesusingthe32MSB

• Solvecorner-casesseparately(CAM);rehashing

– 4memoryaccesses

Page 11: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

IPlongestprefixmatching

Destination =12.5.9.16-------------------------------

payload

Prefix Interface Next Hop

12.0.0.0/8 10.14.22.19 Output-port 2

12.4.0.0/15

12.5.8.0/23 10.1.3.23

Output-port 3

Output-port 4

10.1.3.77

IP Forwarding Table

0.0.0.0/0 10.14.11.33 Output-port 1

even better

OK

better

best!

Page 12: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

LongestPrefixMatchisHarderthanExactMatch• Thedestinationaddressofanarrivingpacket

doesnotcarrywithittheinformationtodeterminethelengthofthelongestmatchingprefix

• Hence,oneneedstosearchamongthespaceofallprefixlengths;aswellasthespaceofallprefixesofagivenlength

Page 13: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

CurrentPracticalData

• Cachingworkspoorlyinbackbonerouters– 250,000concurrentflows

• Wirespeedlookupneededfor40-bytepackets– 50%areTCPacks– 32nsec/packetin10Gbs and8nsec/packetin40Gbs

• Lookupdominatedbymemoryaccessesà speedismeasuredbymemoryaccesses

• Prefixlength8-32• Today150,000prefixesà withgrowth– 1millionprefixes

• HigherspeedsneedSRAMàWorthminimizingmemory

Page 14: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

ProblemDefinition

192.2.0/22, R2192.2.2/24, R3 192.2.0/22 200.11.0/22

192.2.2/24

200.11.0/22, R4

200.11.0.33192.2.0.1 192.2.2.100

LPM: Find the most specific route, or the longest matching prefix among all the prefixes matching the destination address of an incoming packet

Page 15: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

LPMinIPv4Use32exactmatchalgorithmsforLPM!

Exact matchagainst prefixes

of length 1

Exact matchagainst prefixes

of length 2

Exact matchagainst prefixes

of length 32

Network Address PortPriorityEncodeand pick

Wecanstartwithprefixlength8

Page 16: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

MetricsforLookupAlgorithms

• Speed(=numberofmemoryaccesses)• Storagerequirements(=amountofmemory)• Lowupdatetime• Scalability

– Withlengthofprefix:IPv4unicast (32b),Ethernet(48b),IPv4multicast(64b),IPv6unicast (128b)

– Withsizeofroutingtable:(sweetspot fortoday’sdesigns=1million)

• Flexibilityinimplementation• Lowpreprocessingtime

Page 17: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

OurToyExample

P1 = 101*P2 = 111*P3 = 11001*P4 = 1*P5 = 0*P6 = 1000*P7 = 100000*P8 = 100*P9 = 110*

Packet: 128.0.0.1 à 100..001à P4, P6, P7, P8à Forward to P7

Page 18: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Unibit (=Radix)Tries

P1 = 101*P2 = 111*P3 = 11001*P4 = 1*P5 = 0*P6 = 1000*P7 = 100000*P8 = 100*P9 = 110*

0pointer

1pointer

prefix

Page 19: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Unibit Tries

P1 = 101*P2 = 111*P3 = 11001*P4 = 1*P5 = 0*P6 = 1000*P7 = 100000*P8 = 100*P9 = 110*

0 1P5

0 1

P4

0 1

P1

0

P8

0 1

P2

0

1 P3

P9

0

P6

0P7

Page 20: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

CompactingOne-WayBranches(variantofPARTICIAtree)P1 = 101*P2 = 111*P3 = 11001*P4 = 1*P5 = 0*P6 = 1000*P7 = 100000*P8 = 100*P9 = 110*

0 1P5

0 1

P4

0 1

P10

P8

0 1

P2

01

P3

P9

00

P6

P7

Page 21: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Unibit Tries– RunningExample

P1 = 101*P2 = 111*P3 = 11001*P4 = 1*P5 = 0*P6 = 1000*P7 = 100000*P8 = 100*P9 = 110*

0 1P5

0 1

P4

0 1

P10

P8

0 1

P2

01

P3

P9

00

P6

P7

Input: 1001 Memory: null

Page 22: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Unibit Tries– RunningExample

P1 = 101*P2 = 111*P3 = 11001*P4 = 1*P5 = 0*P6 = 1000*P7 = 100000*P8 = 100*P9 = 110*

0 1P5

0 1

P4

0 1

P10

P8

0 1

P2

01

P3

P9

00

P6

P7

Input: 1001 Memory: P4

Page 23: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Unibit Tries– RunningExample

P1 = 101*P2 = 111*P3 = 11001*P4 = 1*P5 = 0*P6 = 1000*P7 = 100000*P8 = 100*P9 = 110*

0 1P5

0 1

P4

0 1

P10

P8

0 1

P2

01

P3

P9

00

P6

P7

Input: 1001 Memory: P4

Page 24: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Unibit Tries– RunningExample

P1 = 101*P2 = 111*P3 = 11001*P4 = 1*P5 = 0*P6 = 1000*P7 = 100000*P8 = 100*P9 = 110*

0 1P5

0 1

P4

0 1

P10

P8

0 1

P2

01

P3

P9

00

P6

P7

Input: 1001 Memory: P4 P8

Page 25: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Unibit Tries- Analysis

• W-bitprefixes,N- prefixes:O(W)lookup,O(NW)storageandO(W)updatecomplexity

• Patricia:O(N)storage(why?)• Stillslow,highmemory,but:

– Simple– Extensibletowiderfields

Page 26: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

Multi-bitTries

Depth = WDegree = 2Stride = 1 bit

Binary trieW

Depth = W/kDegree = 2k

Stride = k bits

Multi-ary trie

W/k

Principle: Trade Memory for Speed

Page 27: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

PrefixExpansionwithMulti-bitTries

If stride = k bits, prefix lengths that are not a multiple of k need to be expanded

Prefix Expanded prefixes0* 00*, 01*11* 11*

E.g., k = 2:

Maximum number of expanded prefixes corresponding to one non-expanded prefix = 2k-1

Page 28: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

1000 01 11

Quadrary-Trie (k=2)

P1 = 101*P2 = 111*P3 = 11001*P4 = 1*P5 = 0*P6 = 1000*P7 = 100000*P8 = 100*P9 = 110*

P5a

P1bP8

P2a

10

P3a

P9a

00

P6

P7

1000 01 11

P5b P4b

1000 01 11

P4a

P2b

P3b

P9b

11P1a

Page 29: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

PrefixExpansionIncreasesStorageConsumption

• Replicationofnext-hopptr• Greaternumberofunused(null)pointersinanode

• Improvement:FromFixed-StrideTriestoVariable–StrideTries

Time ~ W/kStorage ~ NW/k * 2k-1

Page 30: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

30

TernaryContent-AddressableMemory(TCAM)

Enco

der

Match lines

Search Key

01234

65

789

2

01234

65

789

17

0

32

7701

3TCAM Array

Each entry is a word in

{0,1,*}W and represents a

rule

Page 31: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

31

Example

Enco

der

Match lines

01234

65

789

35

1

12

127214321

2

0011101101010**00*01001111****11*00*00001110*0*101000110****10**010100*0**0100011010*01000001110*************************1110**010*01*0010101010*0*****11**10010*01*0010****10*01*****************************001110****10101010***********************111111111111111111111111*

0011101010101001110001110001110

00010

10

101

3

*******************************

Page 32: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

32

TCAMBenefitsandDisadvantages

• DeterministicSearchThroughput—O(1)search• Veryflexibletootherproblemsaswell

– Nextweek:multi-fieldpacketclassifications• However,relativelycostlyandenergy-consuming– 150$forsmall(4Mbit)TCAM– Energydependsonthenumberofentries

• ~10millionTCAMdevicesalreadydeployed

Page 33: 67886 -Switch and Router Design - TNG Presentation | … ·  · 2017-08-2867886 -Switch and Router Design Dr. David Hay Ross 78b ... à100..001 àP4, P6, ... be used to optimize

TypicalDimensionsandSpeed

• 100K-200Krules• 100-150symbolsperrule• 133millionsearchespersecondfor144-bitkeys

– Suitableevenfor40Gb/s traffic

• IPv4andIPv6lookupsaretrivialwithTCAM

• Extrasymbols areleftineachentry,thatcanbeusedtooptimizeTCAMperformance

33