Beyond TCP: The evolution of Internet transport protocols
-
Upload
olivier-bonaventure -
Category
Internet
-
view
477 -
download
2
Transcript of Beyond TCP: The evolution of Internet transport protocols
BeyondTCP:Theevolu0onofInternettransportprotocols
OlivierBonaventureUCL
h2p://inl.info.ucl.ac.be
Paris,Polytechnique,Jan,2016
Agenda
• Internettransportprotocols– TCP– SCTP
• MulKpathTCP– Basicprinciples– Usecases
• What'snext?– QUIC
TheoriginsofTCP
Source:h2p://spectrum.ieee.org/compuKng/soRware/the-strange-birth-and-long-life-of-unix
TheUnixpipemodel
echo wc1234 abbsbbbs
TheTCPbytestreammodel
Client ServerABCDEF...111232
0988989 ... XYZZ
IP:1.2.3.4 IP:4.5.6.7
TCP
Morethan30yearsold!
CongesKoncollapse
JACOBSON,V.CongesKonavoidanceandcontrol.InProceedingsofSIGCOMM’88(Stanford,CA,Aug.1988),ACM.
Performanceissues
• TCPconsideredtobetoocomplexbymany– SoRwareimplementaKoncannotcopewithincreasingnetworkbandwidth
• Forhighperformance,transportshouldbeimplementedinhardware– Transputers– Simplertransportprotocols
MorelimitaKonsofTCP
• IssueswiththeTCPpipemodel– Onlysupportsasinglebytestream
• SomeapplicaKonsneedseveralstreamswithprioriKes
– Nosupportformessages– ConnecKonsarea2achedtooneIPaddressonclientandoneIPaddressonserver
• NofailoverevenifhostshavemulKpleinterfaces• Nosupportformobility• NoloadbalancingformulKhomedhosts
SCTP:AnalternaKvetoTCP
SCTPintwoslides
• Moderntransportprotocol– CleanerconnecKonestablishment
• Four-wayhandshaketocounterSYNfloodinga2acks– Cleanerprotocol
• FlexibleTLVpacketformatthatiseasytoextend• SelecKveacknowledgementsfromthestart
– RichersemanKcs• Messages,mulKplestreams,unreliabledelivery• AdvancedAPItoreplacesocketAPI
– Failoversupport• ConnecKoncanmovefromoneIPaddresstoanotherone
SCTPconnecKonestablishment
INIT,Itag=1234
INIT-ACK,cookie,ITag=5678
COOKIE-Echo,Vtag=5678,cookie
COOKIE-ACK,Vtag=1234
Encryptstateincookie,Doesnotstoreit
Decryptscookie,Recoverinfotocreatestate
WhatwentwrongwithSCTP?
• Replacingatransportprotocol
PhysicalDatalinkNetwork
TCPApplication
SCTP
ApplicaKonsmustberewri<enwithnewAPI
IPprotocol=132ForSCTPpackets
DeployingSCTP
• ApplicaKonsdeveloperswillinvestinSCTPassoonasSCTPisimplementedon– Clients– Servers
TheInternetarchitecturethatweexplaintoourstudents
PhysicalDatalinkNetwork
TransportApplication
O.Bonaventure,Computernetworking:Principles,ProtocolsandPracKce,openebook,h2p://inl.info.ucl.ac.be/cnp3
Physical
PhysicalDatalink
PhysicalDatalinkNetwork
SCTPdeployment
PhysicalDatalinkNetwork
TransportApplication
PhysicalDatalinkNetwork
TransportApplication
PhysicalDatalinkNetwork
PhysicalDatalink
TCPSCTPSCTP SCTP
Inreality
– almostasmanymiddleboxesasrouters– varioustypesofmiddleboxesaredeployed
Sherry,JusKne,etal."Makingmiddleboxessomeoneelse'sproblem:Networkprocessingasacloudservice."ProceedingsoftheACMSIGCOMM2012conference.ACM,2012.
InternetdevicesaccordingtoCisco
h2p://www.cisco.com/web/about/ac50/ac47/2.html
WebSecurityAppliance
NAC Appliance
ACEXMLGateway
Streamer
VPNConcentrator
SSLTerminator
CiscoIOSFirewall
IPTelephonyRouter
PIXFirewallRightandLeR
VoiceGatewayVVVV
ContentEngine
NAT
Middleboxesinthearchitecture
• Intheofficialarchitecture,theydonotexist• Inreality...
PhysicalDatalinkNetwork
TransportApplication
PhysicalDatalinkNetwork
TransportApplication
PhysicalDatalinkNetwork
TCP
PhysicalDatalinkNetwork
TransportApplication
TCPsegmentsprocessedbyarouter
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Ver IHL ToS Total length
Checksum TTL ProtocolFlags Frag. Offset
Source IP address
Identification
Destination IP address
Payload
Options
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Ver IHL ToS Total length
Checksum TTL ProtocolFlags Frag. Offset
Source IP address
Identification
Destination IP address
Payload
Options
IP
TCP
TCPsegmentsprocessedbyaNAT
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Ver IHL ToS Total length
Checksum TTL ProtocolFlags Frag. Offset
Source IP address
Identification
Destination IP address
Payload
Options
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Ver IHL ToS Total length
Checksum TTL ProtocolFlags Frag. Offset
Source IP address
Identification
Destination IP address
Payload
Options
TCPsegmentsprocessedbyaNAT(2)
• acKvemodeRpbehindaNAT
220ProFTPD1.3.3dServer(BELNETFTPDServer)[193.190.67.15]Rp_login:user`<null>'pass`<null>'host`Rp.belnet.be'Name(Rp.belnet.be:obo):anonymous--->USERanonymous331Anonymousloginok,sendyourcompleteemailaddressasyourpasswordPassword:--->PASSXXXX--->PORT192,168,0,7,195,120200PORTcommandsuccessful--->LIST150OpeningASCIImodedataconnecKonforfilelistlrw-r--r--1RpRp6Jun12011pub->mirror226Transfercomplete
TCPsegmentsprocessedbyanALGrunningonaNAT
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Ver IHL ToS Total length
Checksum TTL ProtocolFlags Frag. Offset
Source IP address
Identification
Destination IP address
Payload
Options
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Ver IHL ToS Total length
Checksum TTL ProtocolFlags Frag. Offset
Source IP address
Identification
Destination IP address
Payload
Options
© O. Bonaventure, 2011
HowtransparentistheInternet?• 25thSeptember2010to30thApril2011
• 142accessnetworks• 24countries• SentspecificTCPsegmentsfromclienttoaserverinJapan
Honda,Michio,etal."Isits=llpossibletoextendTCP?"Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.
End-to-endtransparencytoday
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Ver IHL ToS Total length
Checksum TTL ProtocolFlags Frag. Offset
Source IP address
Identification
Destination IP address
Payload
Options
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Ver IHL ToS Total length
Checksum TTL ProtocolFlags Frag. Offset
Source IP address
Identification
Destination IP address
Payload
Options
Middleboxesdon'tchangetheProtocolfield,but
somediscardpacketswithaProtocolfielddifferentthan
TCPorUDP
Agenda
• Internettransportprotocols– TCP– SCTP
• MulKpathTCP– Basicprinciples– Usecases
• What'snext?– QUIC
TCPConnecKonestablishment• Three-wayhandshake
SYN,seq=1234,OpKons
SYN+ACK,ack=1235,seq=5678,OpKons
ACK,seq=1235,ack=5679
Datatransfer
seq=1234,"abcd"
ACK,ack=1238,win=4
seq=1238,"efgh"
ACK,ack=1242,win=0
ConnecKonrelease
seq=1234,"abcd"
RST
ConnecKonrelease
seq=1234,"abcd"
ACK,ack=1239
FIN,ack=350
seq=345,"ijkl"
FIN,seq=1238
FIN,seq=349
MulKpathTCP
• HowcanweefficientlyusethemulKpleinterfacesthatareavailableontoday'shosts?
DesignobjecKves
• MulKpathTCPisanevolu=onofTCP
• DesignobjecKves– SupportunmodifiedapplicaKons– Workovertoday’snetworks(IPv4andIPv6)– WorksinallnetworkswhereregularTCPworks
TheMul=pathTCPbytestreammodel
33
Client ServerABCDEF...111232
0988989 ... XYZZ
IP:1.2.3.4 IP:4.5.6.7
IP:2.3.4.5 IP:6.7.8.9
BCD A
TheMulKpathTCPprotocol
• Controlplane– HowtomanageaMulKpathTCPconnecKonthatusesseveralpaths?
• Dataplane– Howtotransportdata?
• CongesKoncontrol– HowtocontrolcongesKonovermulKplepaths?
AnaïveMulKpathTCP
SYN+ACK+OpKonACK
seq=123,"abc"
seq=126,"def"
SYN+OpKon
AnaïveMulKpathTCPIntoday'sInternet?
SYN+OpKon
SYN+ACK+OpKonACK
seq=123,"abc"
seq=126,"def"
ThereisnocorrespondingTCPconnecKon
Designdecision
– AMul=pathTCPconnec=oniscomposedofoneormoreregularTCPsubflowsthatarecombined
• EachhostmaintainsstatethatgluestheTCPsubflowsthatcomposeaMulKpathTCPconnecKontogether
• EachTCPsubflowissentoverasinglepathandappearslikearegularTCPconnecKonalongthispath
MulKpathTCPandthearchitecture
PhysicalDatalinkNetwork
TransportApplication MulKpathTCP
TCP1
socket
TCP2 TCPn...
Application
A.Ford,C.Raiciu,M.Handley,S.Barre,andJ.Iyengar,“ArchitecturalguidelinesformulKpathTCPdevelopment",RFC61822011.
NomodificaKontoeasedeployment
MulKplesubflowstocopewithmiddleboxes
AregularTCPconnecKon
• WhatisaregularTCPconnecKon?
– Itstartswithathree-wayhandshake• SYNsegmentsmaycontainspecialopKons
– Alldatasegmentsaresentinsequence• Thereisnogapinthesequencenumbers
– ItisterminatedbyusingFINorRST
MulKpathTCPSYN+OpKon
SYN+ACK+OpKonACK
SYN+OtherOpKon
SYN+ACK+OtherOpKonACK
HowtocombinetwoTCPsubflows?
SYN+OpKon
SYN+ACK+OpKonACK
SYN+OtherOpKonSYN+ACK+OtherOpKon
ACK
Howtolinkwithbluesubflow?
TCP101IdenKficaKonofaTCPconnecKon
Fourtuple– IPsource– IPdest– Portsource– PortdestAllTCPsegmentscontainthefourtuple
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Ver IHL ToS Total length
Checksum TTL ProtocolFlags Frag. Offset
Source IP address
Identification
Destination IP address
Payload
Options
IP
TCP
HowtolinkTCPsubflows?SYN,Portsrc=1234,Portdst=80+OpKon
SYN+ACK[...]
ACK
SYN,Portsrc=1235,Portdst=80+OpKon[linkPortsrc=1234,Portdst=80]
ANATcouldchangeaddressesandportnumbers
HowtolinkTCPsubflows?SYN,Portsrc=1234,Portdst=80+OpKon[Token=5678]
SYN+ACK+OpKon[Token=6543]ACK
SYN,Portsrc=1235,Portdst=80+OpKon[Token=6543]
MyToken=5678YourToken=6543
MyToken=6543YourToken=5678
TCPsubflows
• WhichsubflowscanbeassociatedtoaMulKpathTCPconnecKon?– Atleastoneoftheelementsofthefour-tupleneedstodifferbetweentwosubflows
• LocalIPaddress• RemoteIPaddress• Localport• Remoteport
TCPsubflowsinpracKce
• MulKpathTCPsupportssubflowagility– Client/servercanaddsubflowsatanyKme– Client/servercanremovesubflowsatanyKme
TheMulKpathTCPprotocol
• Controlplane– HowtomanageaMulKpathTCPconnecKonthatusesseveralpaths?
• Dataplane– Howtotransportdata?
• CongesKoncontrol– HowtocontrolcongesKonovermulKplepaths?
Howtotransferdata?seq=123,"a"
seq=124,"b"
seq=125,"c"
seq=126,"d"
ack=124
ack=126
ack=125
ack=127
Howtotransferdataintoday'sInternet?
seq=123,"a"
seq=124,"b"
seq=125,"c"ack=124
ack=126
ack=125
GapinsequencenumberingspaceSomeDPIwillnotallowthis!
MulKpathTCPDatatransfer
• Twolevelsofsequencenumbers
MulKpathTCP
TCP1
socket
TCP2
MulKpathTCP
TCP1
socket
TCP2
ABCDEF
Datasequence#
TCP1sequence#
TCP2sequence#
MulKpathTCPDatatransfer
Dseq=0,seq=123,"a"
DSeq=1,seq=456,"b"
DSeq=2,seq=124,"c"DAck=1,ack=124
DAck=3,ack=125
DAck=2,ack=457
MulKpathTCPHowtodealwithlosses?
• DatalossesoveroneTCPsubflow– FastretransmitandKmeoutasinregularTCP
Dseq=0,seq=123,"a"
DAck=1,ack=124Dseq=0,seq=123,"a"
DAck=1,ack=124
MulKpathTCP
• WhathappenswhenaTCPsubflowfails?Dseq=0,seq=123,"a"
DSeq=1,seq=456,"b"DAck=0,ack=457
Dseq=0,seq=457,"a"
DAck=2,ack=458
RetransmissionheurisKcs
• HeurisKcsusedbycurrentLinuximplementaKon– Fastretransmitisperformedonthesamesubflowastheoriginaltransmission
– UponKmeoutexpiraKon,reevaluatewhetherthesegmentcouldberetransmi2edoveranothersubflow
– Uponlossofasubflow,alltheunacknowledgeddataareretransmi2edonothersubflows
Flowcontrol
• Howshouldthewindow-basedflowcontrolbeperformed?– IndependantwindowsoneachTCPsubflow
– AsinglewindowthatissharedamongallTCPsubflows
Independantwindows
Dseq=0,seq=123,"a"
DSeq=1,seq=456,"b"DAck=2,ack=457,win=100
Dseq=2,seq=457,"c"
DAck=3,ack=458,win=100
DAck=1,ack=124,win=0
Independantwindowspossibleproblem
• Impossibletoretransmit,windowisalreadyfullongreensubflow
Dseq=0,seq=123,"a"
DSeq=1,seq=456,"b"DAck=2,ack=457,win=0
Asinglewindowsharedbyallsubflows
Dseq=0,seq=123,"a"
DSeq=1,seq=456,"b"DAck=2,ack=457,win=10
Dseq=2,seq=457,"c"
DAck=3,ack=458,win=10
DAck=1,ack=124,win=10
AsinglewindowsharedbyallsubflowsImpactofmiddleboxes
Dseq=0,seq=123,"a"
DSeq=1,seq=456,"b"DAck=2,ack=457,win=100
DAck=1,ack=124,win=100
DAck=2,ack=457,win=5
MulKpathTCPWindows
• MulKpathTCPmaintainsonewindowperMulKpathTCPconnecKon– WindowisrelaKvetothelastackeddata(DataAck)– Windowissharedamongallsubflows
• It'suptotheimplementaKontodecidehowthewindowisshared– Windowistransmi2edinsidethewindowfieldoftheregularTCPheader
– Ifmiddleboxeschangewindow field,• uselargestwindowreceivedatMPTCP-level• usereceivedwindowovereachsubflowtocopewiththeflowcontrolimposedbythemiddlebox
MulKpathTCPbuffers
MulKpathTCP
TCP1
socket
TCP2
Scheduler
Transmitqueues,processonlyregular
TCPheader
Reorderqueue,processesonlyTCPheader
MPTCP-level,resequencingpossible
send(...)recv(...)
SendingMulKpathTCPinformaKon
• HowtoexchangetheMulKpathTCPspecificinformaKonbetweentwohosts?
• OpKon1– UseTLVstoencodedataandcontrolinformaKoninsidepayloadofsubflows
• Op0on2– UseTCPopKonstoencodeallMulKpathTCPinformaKon
OpKon1:MichaelScharf,Thomas-RolfBanniza,MCTCP:AMul=pathTransportShimLayer,GLOBECOM2011
MulKpathTCPwithonlyopKons
• Advantages– NormalwayofextendingTCP
– Shouldbeabletogothroughmiddleboxesorfallback
• Drawbacks– limitedsizeoftheTCPopKons,notablyinsideSYN
– WhathappenswhenmiddleboxesdropTCPopKonsindatasegments
MulKpathTCPusingTLV
• Advantages– MulKpathTCPcouldstartasregularTCPandmovetoMulKpathonlywhenneeded
– Couldbeimplementedasalibraryinuserspace
– TLVscanbeeasilyextended
• Drawbacks– TCPsegmentscontainTLVsincludingthedataandnotonlythedata
• problemformiddleboxes,DPI,..
– Middleboxesbecomemoredifficult
MichaelScharf,Thomas-RolfBanniza,MCTCP:AMul=pathTransportShimLayer,GLOBECOM2011
© O. Bonaventure, 2011
IsitsafetouseTCPopKons?
• KnownopKon(TS)inDatasegments
XD6BHM
Honda,Michio,etal."IsitsKllpossibletoextendTCP?."Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.
© O. Bonaventure, 2011
IsitsafetouseTCPopKons?
• UnknownopKoninDatasegments
XD6BHM
Honda,Michio,etal."IsitsKllpossibletoextendTCP?."Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.
MulKpathTCPopKons
• TCPopKonformat
• IniKaldesign
– OneopKonkindforeachpurpose(e.g.DataSequencenumber)
• Finaldesign– Asinglevariable-lengthMulKpathTCPopKon
Kind Length OpKon-specificdata
MulKpathTCPopKon
• AsingleopKontype– tominimisetheriskofhavingoneopKonacceptedbymiddleboxesinSYNsegmentsandrejectedinsegmentscarryingdata
Subtype Kind Length
Subtype specific data(variable length)
DatasequencenumbersandTCPsegments
• HowtotransportDatasequencenumbers?– SamesoluKonasforTCP
• DatasequencenumberinTCPopKonistheDatasequencenumberofthefirstbyteofthesegment
Source port Destination port
Checksum Urgent pointer
THL Reserved Flags
Acknowledgment number
Sequence number
Window
Payload
Datasequence number
MulKpathTCPDatatransfer
Dseq=0,seq=123,"a"
DSeq=1,seq=456,"b"
DSeq=2,seq=124,"c"DAck=1,ack=124
DAck=3,ack=125
DAck=2,ack=457
WhichmiddleboxeschangeTCPsequencenumbers?
• SomefirewallschangeTCPsequencenumbersinSYNsegmentstoensurerandomness– fixforoldwindows95bug
• TransparentproxiesterminateTCPconnecKons
Middleboxinterference
• Datasegments
Data,seq=12,"ab"
Data,seq=14,"cd"Data,seq=12,"abcd"
SuchamiddleboxcouldalsobethenetworkadapteroftheserverthatusesLROtoimproveperformance.
© O. Bonaventure, 2011
Segmentcoalescing
Honda,Michio,etal."IsitsKllpossibletoextendTCP?."Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.
Datasequencenumbersandmiddleboxes
seq=123,Dseq=0,"a"
seq=456,DSeq=1,"b"
seq=124,DSeq=2,"c" seq=123,DSeq=2,"ac"
copiesoneopKonincoalescedsegment
bufferssmallsegments
seq=123,DSeq=0,"ac"
Datasequencenumbersandmiddleboxes
seq=123,Dseq=0,"ab"
DSeq=0,seq=123,"a"
DSeq=0,seq=124,"b"MiddleboxonlyunderstandsregularTCP
A"middlebox"thatbothsplitsandcoalescesTCPsegments
Datasequencenumbersandmiddleboxes
• HowtoavoiddesynchronisaKonbetweenthebytestreamanddatasequencenumbers?
• SoluKon– MulKpathTCPopKoncarriesmappingbetweenDatasequencenumbersand(differencebetweenini=alandcurrent)subflowsequencenumbers
• mappingcoversapartofthebytestream(length)
MulKpathTCPDatatransfer
seq=123,DSS[0->123,len=1],"a"
seq=456,DSS[1->456,len=1],"b"
seq=124,DSS[2->124,len=1],"c"DAck=1,ack=124
DAck=3,ack=125
DAck=2,ack=457
Datasequencenumbersandmiddleboxes
seq=123,DSS[0->123,len=1],"a"
seq=456,DSS[1->456,len=1],"b"
seq=124,DSS[2->124,len=1],"c"
seq=123,DSS[0->123,len=1],"ac"
DAck=2,ack=125
DSeq=0,ack=457
seq=125,DSS[2->125,len=1],"c"
Datasequencenumbersandmiddleboxes
seq=123,DSS[0->123,len=1],"a"
seq=456,DSS[1->456,len=1],"b"
seq=124,DSS[2->124,len=1],"c"
seq=123,DSS[2->124,len=1],"ac"DAck=0,ack=125
seq=125,DSS[0->125,len=1],"a"
DAck=3,ack=126
MulKpathTCPandmiddleboxes
• WiththeDSSmapping,MulKpathTCPcancopewithmiddleboxesthat– combinesegments– splitsegments
• AretheythemostannoyingmiddleboxesforMulKpathTCP?
– Unfortunatelynot
© O. Bonaventure, 2011
TCPsequencenumberandmiddleboxes
Honda,Michio,etal."IsitsKllpossibletoextendTCP?."Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.
Theworstmiddlebox
• Isthisanacademicexerciseorreality?
seq=123,DSS[1->123,len=2],"aXXXb"
DAck=3,ack=125
seq=125,DSS[3->125,len=2],"cd"
seq=123,DSS[1->123,len=2],"ab"
DAck=3,ack=128
seq=128,DSS[3->125,len=2],"cd"
Theworstmiddlebox
• Isunfortunatelyveryold...– AnyALGforaNAT
220ProFTPD1.3.3dServer(BELNETFTPDServer)[193.190.67.15]Rp_login:user`<null>'pass`<null>'host`Rp.belnet.be'Name(Rp.belnet.be:obo):anonymous--->USERanonymous331Anonymousloginok,sendyourcompleteemailaddressasyourpasswordPassword:--->PASSXXXX--->PORT192,168,0,7,195,120200PORTcommandsuccessful--->LIST150OpeningASCIImodedataconnecKonforfilelistlrw-r--r--1RpRp6Jun12011pub->mirror226Transfercomplete
Copingwiththeworstmiddlebox
• WhatshouldMulKpathTCPdointhepresenceofsuchaworstmiddlebox?– Donothingandignorethemiddlebox
• butthenthebytestreamandtheapplicaKonwouldbebrokenandthisproblemwillbedifficulttodebugbynetworkadministrators
– Detectthepresenceofthemiddlebox• andfallbacktoregularTCP(i.e.useasinglepathandnothingfancy)
MulKpathTCPMUSTworkinallnetworkswhereregularTCPworks.
DetecKngtheworstmiddlebox?
• HowcanMulKpathTCPdetectamiddleboxthatmodifiesthebytestreamandinserts/removesbytes?– VarioussoluKonswereexplored
– Intheend,MulKpathTCPchosetoincludeitsownchecksumtodetectinserKon/deleKonofbytes
Theworstmiddleboxseq=123,DSS[1->123,len=2,Inv],"aXXXb"
seq=123,DSS[1->123,len=2,V],"ab"
RST,lastDSeq=0RST,lastDSeq=0
seq=456,DSS[1->456,len=2,V],"ab"DAck=3,ack=458
MulKpathTCPDatasequencenumbers
• Whatshouldbethelengthofthedatasequencenumbers?– 32bits
• compactandcompaKblewithTCP• wraparoundproblemathighspeedrequiresPAWS
– 64bits• wraparoundisnotanissueformosttransferstoday• takesmorespaceinsideeachsegment
MulKpathTCPDatasequencenumbers
• DatasequencenumbersandDataacknowledgements– MaintainedinsideimplementaKonas64bitsfield
– ImplementaKonscan,asanopKmisaKon,onlytransmitthelower32bitsofthedatasequenceandacknowledgements
DataSequenceSignalopKon
CumulaKveDataack
A=DataACKpresenta=DataACKis8octetsM=mappingpresentm=DSNis8
Lengthofmapping,canextendbeyondthissegment
ComputedoverdatacoveredbyenKremapping+pseudoheader
TheMulKpathTCPprotocol
• Controlplane– HowtomanageaMulKpathTCPconnecKonthatusesseveralpaths?
• Dataplane– Howtotransportdata?
• CongesKoncontrol– HowtocontrolcongesKonovermulKplepaths?– CongesKonwindowsonsubflowsMUSTbecoupledtoensurethatTCPremainsfairwithregularTCP
AIMDinTCP
• CongesKoncontrolmechanism– Eachhostmaintainsaconges=onwindow(cwnd)– NocongesKon
• CongesKonavoidance(addi0veincrease)– increasecwndbyonesegmenteveryround-trip-Kme
– CongesKon• TCPdetectscongesKonbydetecKnglosses• MildcongesKon(fastretransmit–mul0plica0vedecrease)
– cwnd=cwnd/2andrestartcongesKonavoidance• SeverecongesKon(Kmeout)
– cwnd=1,setslow-start-thresholdandrestartslow-start
EvoluKonofthecongesKonwindow
Cwnd Fast retransmit
ThresholdThreshold
Slow-startexponential increase of cwnd
Congestion avoidance linear increase of cwnd
Fast retransmit
Time
CongesKoncontrolforMulKpathTCP
• Simpleapproach– independantcongesKonwindows
ThresholdThreshold
Threshold
IndependantcongesKonwindows
• Problem
12Mbps
CoupledcongesKoncontrol
• CongesKonwindowsarecoupled– congesKonwindowgrowthcannotbefasterthanTCPwithasingleflow
– CoupledcongesKoncontrolaimsatmovingtrafficawayfromcongestedpath
Agenda
• Internettransportprotocols– TCP– SCTP
• MulKpathTCP– Basicprinciples– Usecases
• What'snext?– QUIC
MulKpathTCPusecasesThebeast
TCPonservers
• Howtoincreaseserverbandwidth?
• Loadbalancingtechniques– packetperpacket– perflowloadbalancing
• eachTCPconnecKonismappedontooneinterface
IncreasingserverbandwidthwithMulKpathTCP
• LoadbalancingwithMulKpathTCP– CongesKoncontrolefficientlyusesthetwolinksforeachMPTCPconnecKon
– AutomaKcfailoverincaseoffailures
HowfastcanMulKpathTCPgo?
h2p://linux.slashdot.org/story/13/03/23/0054252/a-50-gbps-connecKon-with-mulKpath-tcp
HowfastcanMulKpathTCPgo?
Datacenters evolve
• Traditional Topologies are tree-based– Poor performance– Not fault tolerant
• Shift towards multipath topologies: FatTree, BCube, VL2, Cisco, EC2
…
C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.
Fat Tree Topology [Fares et al., 2008; Clos, 1953]
K=4
1Gbps
1Gbps
AggregaKonSwitches
KPodswithKSwitches
each
Racksofservers
Fat Tree Topology [Fares et al., 2008; Clos, 1953]K=4
AggregaKonSwitches
KPodswithKSwitches
each
Racksofservers
C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.
Collisions
TCPindatacenters
TCPinFATtreenetworksCostofcollissions
C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.
0
200
400
600
800
1000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Thro
ughp
ut (M
b/s)
Rank of Flow
MPTCPOptimal Throughput
TCP Flow Throughput
Howtogetridofthesecollisions?
• ConsiderTCPperformanceasanopKmisaKonproblem
C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.
TheMulKpathTCPway
Twosubflowsdifferbytheirsourceport
ECMPbalancesthesubflowsoverdifferentpaths
MPTCPbe2eruKlizestheFatTreenetwork
0
200
400
600
800
1000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
Thro
ughp
ut (M
b/s)
Rank of Flow
MPTCPOptimal Throughput
TCP Flow Throughput
C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.
SeealsoG.Detal,etal.,Revisi=ngFlow-BasedLoadBalancing:StatelessPathSelec=oninDataCenterNetworks,ComputerNetworks,April2013forextensionstoECMPforMPTCP
HowmanysubflowsdoesMulKpathTCPneed?TotalThroughput
0 10 20 30 40 50 60 70 80 90
100
RLB 2 3 4 5 6 7 8
Thro
ughp
ut (%
of o
ptim
al)
Multipath TCPTCP
C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.
CanweimproveMulKpathTCP?• Twosubflowsmayfollowsimilarpaths
ImprovingECMP• ECMP'shash
– goodloadbalancing– impossibletopredictresult
• CFLB– replaceshashwithblockcipher
– hostscanselectpathsforMulKpathTCPsubflowsprovidedtheyknowdatacentertopology
G.Detal,Ch.Paasch,S.vanderLinden,P.Mérindol,G.Avoine,O.Bonaventure,Revisi=ngFlow-BasedLoadBalancing:StatelessPathSelec=oninDataCenterNetworks,toappearinComputerNetworks
MulKpathTCPwithCFLBinFat-Tree
G.Detal,Ch.Paasch,S.vanderLinden,P.Mérindol,G.Avoine,O.Bonaventure,Revisi=ngFlow-BasedLoadBalancing:StatelessPathSelec=oninDataCenterNetworks,toappearinComputerNetworks
MulKpathTCPonEC2
• AmazonEC2:infrastructureasaservice– Wecanborrowvirtualmachinesbythehour– TheseruninAmazondatacentersworldwide– Wecanbootourownkernel
• AfewavailabilityzoneshavemulKpathtopologies– 2-8pathsavailablebetweenhostsnotonthesamemachineorinthesamerack
– AvailableviaECMP
AmazonEC2Experiment
• 40mediumCPUinstancesrunningMPTCP• During12hours,wesequenKallyranall-to-alliperfcyclingthrough:– TCP– MPTCP(2and4subflows)
MPTCPimprovesperformanceonEC2
SameRack
0 100 200 300 400 500 600 700 800 900
1000
0 500 1000 1500 2000 2500 3000
Thro
ughp
ut (M
b/s)
Flow Rank
TCPMPTCP, 4 subflowsMPTCP, 2 subflows
C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.
MoKvaKon
• Onedevice,manyIP-enabledinterfaces
sshwithMulKpathTCP
MPTCPoverWiFi/3G
8Mbps,20ms
2Mbps,150ms
TCPoverWiFi/3G
C.Raiciu,etal.“Howhardcanitbe?designingandimplemenKngadeployablemulKpathTCP,”NSDI'12:Proceedingsofthe9thUSENIXconferenceonNetworkedSystemsDesignandImplementaKon,2012.
MPTCPoverWiFi/3G
C.Raiciu,etal.“Howhardcanitbe?designingandimplemenKngadeployablemulKpathTCP,”NSDI'12:Proceedingsofthe9thUSENIXconferenceonNetworkedSystemsDesignandImplementaKon,2012.
MPTCPoverWiFi/3G
MulKpathTCPincreasesthroughput
MPTCPoverWiFi/3G
Whathappenedhere?
Understandingtheperformanceissue
8Mbps,20ms
2Mbps,150ms Window
B
A
CD
Windowfull!NonewdatacanbesentonWiFipath
A
Reinjectsegmentonfastpath
Halveconges0onwindowonslowsubflow
MPTCPoverWiFi/3G
MulKpathTCPusecasesLowlatencyforSiri
• Long-livedTLSconnecKons
WiFi
3G/LTE
Voicesamples
Voicesamples
MulKpathTCPusecasesHighbandwidthonsmartphones
• Koreanswant800+Mbpsonsmartphones
WiFi
4G/LTE
Multipath TCP Regular TCP
SOCKS
Fasterbroadbandnetworks?
MulKpathTCPusecasesHybridAccessNetworks
DSL
4G/LTE
Multipath TCP Regular TCP
Hybrid AccessGateway
TCP
TCP
Agenda
• Internettransportprotocols– TCP– SCTP
• MulKpathTCP– Basicprinciples– Usecases
• What'snext?– QUIC
Issueswiththecurrentstack
PhysicalDatalink
IPv4/IPv6TCP
HTTP1.1
ASCIIdifficulttoparse,nopriority
UnsecureWaitforthreewayhandshakebefore
datatransfer
PhysicalDatalink
IPv4/IPv6TCP
HTTP/2TLS
Secure,Butaddsmoredelay
PhysicalDatalink
IPv4/IPv6UDPQUICFirstbytes
A_er2RTTs
FirstbytesA_er3-4RTTs Firstbytes
A_er0RTT
QUICinanutshell
• FirstconnecKona2empt
CHLO[SNI,VER]
CHLO[Token,Cryptoinfo]
ServerNameandVersion
Rejected
REJ[Config,Token,CerKficate]
DATA[Encrypted]
SHLO[Config,Token,CerKficate]
DATA[Encrypted]
QUICfeatures
• CongesKoncontrol– LeveragesTCP'slonghistory(CUBIC)
• Retransmissions– Be2erthanwithregularTCP– Eachsegmenthasadifferentseqnum
• AvoidsretransmissionambiguiKes
• SelecKveacknowledgements– CleanerthaninTCP
QUICusageatgoogle
QUIChandshakesfailwhenRTTsaregreaterthan2.5secondsorwhenUDPisblocked
Source:J.Iyengar,QUICOverview,IETF93,July2015,Prague
QUICReducingdelays
TCP TCP + TLS QUIC (equivalent to TCP + TLS)
Source:J.Iyengar,QUICOverview,IETF93,July2015,Prague
WhyrunningQUICoverUDP?
• Simplesttransportprotocol– SupportedcorrectlybyalloperaKngsystems– Supportedcorrectlybyallmiddleboxes
• ApplicaKoncanenKrelycontroleverything– SameversionofQUICrunsonallpla�orms– QUICcanbeupgradedasfrequentlyastheapplicaKon– ApplicaKondeveloperdoesnotneedtocoordinatewithIETForanyone
Howtocopewithmiddleboxes?
• VeryfewmiddleboxesinterferewithUDP– SomemiddleboxesdropUDPsegments
• ApplicaKonswilldetectandfallbacktoTCP– SomemiddleboxesratelimitUDP
• ApplicaKonswilldetectandfallbacktoTCP
• WhataboutmiddleboxesopKmisingQUIC/UDP– Nightmareforgoogle– EverythinginQUIC(payloadandheaders)isencrypted
TFO:AFasterTCP
• Simpleidea:senddatainSYNsegments– ModernversionofT/TCP
SYN(Src=C,seq=x, HTTP GET)HTTP GET
SYN+ACK(Dest=C,ack=x+1,seq=y, HTTP Resp)
ACK(Src=A,seq=x)
Internettransportlayer• SKlllotsofinnovaKonforanoldlayer…
– TCPextensions• IniKalwindow,TCPFastOpen,…
– MulKpathTCPisge�ngdeployed• RFC6824waspublishedinJanuary2013
– ButMiddleboxeshaveossifiedtheInternet
• Otherprotocols– QUIC
• PushedbygoogleforwebapplicaKons– TCPINC
• SupportencrypKoninsidetransportlayer– TLS1.3
• Fasterhandshakeandlowerdelays