1 Building a connection-oriented internet Outline Problem statement CHEETAH: an NSF-funded...

download 1 Building a connection-oriented internet Outline Problem statement CHEETAH: an NSF-funded experimental project Research problems The “internet” name in.

If you can't read please download the document

description

3 Motivation Analogy with transportation networks –“connectionless” roadways no reservation made “fair” but delay/jitter uncontrolled –“connection-oriented” airlines reservation based better guarantees on delay/jitter airport Call to make a reservation (if only for part of the distance: airport-to-airport)

Transcript of 1 Building a connection-oriented internet Outline Problem statement CHEETAH: an NSF-funded...

1 Building a connection-oriented internet Outline Problem statement CHEETAH: an NSF-funded experimental project Research problems The internet name in the title Malathi Veeraraghavan Univ. of Virginia 2 Problem statement How do we add on a complementary connection-oriented internet to the existing (connectionless) Internet? To allow end host applications to make a reservation for bandwidth Anywhere from 10Mbps to 10Gbps 3 Motivation Analogy with transportation networks connectionless roadways no reservation made fair but delay/jitter uncontrolled connection-oriented airlines reservation based better guarantees on delay/jitter airport Call to make a reservation (if only for part of the distance: airport-to-airport) 4 Motivation: eScience NCSU Supercomputers one 36-hour supernova simulation creates 1TB 5 Use Internet2 for 1TB download Bottleneck link rate from NCSU to ORNL via Internet2/ESnet is 2.5 Gbps But Prof. Blondin at NCSU only sees Mbps, Why? 6 Two reasons Disk access limitations at end hosts TCP-IP bandwidth sharing mode IP networks are connectionless i.e., no reservations New transfers can simply start up impacts bandwidth available to ongoing transfers Socialistic resource sharing Host IP router Host Number of bytes sent vs. time changing rate 7 Pros and cons of approach It allows an ongoing transfer to enjoy bandwidth released by other transfers as they complete On the other hand, as new transfers start up within the duration of the 1TB transfer, its rate decreases 8 Coming back to the problem statement How do we provide scientists a rate- guaranteed connection for file transfers At first glance, file transfers look like an ideal app. for high-speed circuits no burstiness can use as much as bandwidth as given 9 One answer: use optical fibers and circuit based gateways Guaranteed rates + high bandwidth Gateways available that can crossconnect a Gigabit Ethernet port to an equivalent-rate time-division or wavelength- division multiplexed signal dynamically Control Gigabit Ethernet interface card Time-division or wavelength-division multiplexing optical interface card Circuit based gateway Circuit based gateway Circuit based gateway Circuit based gateway Circuit based gateway signaling engine: dynamic call setup/release Gigabit Ethernet interfaces to hosts Gigabit Ethernet interfaces to hosts Setup connection (make reservation) Release connection (release resources) Transfer file 10 Networks deployed based on this thinking (related work) Canadas Canarie: CA*net4 OptIPuter project: UCSD and Chicago Chicago: OMNInet (Starlight PoP) Dragon: DC area network Netherlands: SURFnet UK: UKlight DOE UltraScience network 11 Our NSF-funded project: CHEETAH Circuit-switched High-Speed End-to-End Transport arcHitecture Participants: Malathi Veeraraghavan, UVA Nagi Rao, Bill Wing, Tony Mezzacappa, ORNL John Blondin, NCSU Ibrahim Habib, CUNY $3.5M project for three years, Acknowledgment: NSF EIN grant ANI Duke NCSUMCNC UNC Qwest PoP Adding Cheetah into existing optical NC university network smj Level(3) PoP NLR Gb/s Transponder Circuit- based gateway Centaur Lab (John Blondins cluster compter) Mark Johnson, MCNC National LambdaRail Internet2 connection optical fiber Wavelength Division Mux/Demux (WDM) 13 National LambdaRail Affordable prices for 10Gbps lambdas 14 Cheetah network Implements sig. protocols Control card OC192 card GbE/ 10GbE card GbE/10GbE Ethernet Switch To Cray ORNL Circuit based gateway To DC Dragon Atlanta NLR WDM GaTech WDM GaTech WDM ORNL WDM OC192 NLR SOX NC GbE/10GbE Ethernet Switch To cluster computer NCSU OC192 card Control card GbE/ 10GbE card MCNC/NLR Circuit-based gateway OC192 card 10GbE 10 Gbps lambda 15 Connecting Cheetah to Dragon and Ultrascience networks Dragon Cheetah DOE Ultrascience network (ORNL) 16 All this is fun, but What are the research problems? What happens if the call gets blocked? Bandwidth sharing modes Low load performance Scheduled vs. immediate-request Long paths vs. short paths Mismatch between multitasking end hosts and TDM circuits 17 What happens if the call gets blocked? In TCP/IP networks, your new transfer just joins in, perhaps receives small BW until some other transfers complete In circuit-switched networks, your call is accepted or rejected If rejected, what then? Call queueing wastes resources because of multiple links Acknowledgment: NSF ITR grant ANI 18 Practical answer: Leverage presence of Internet path and fall back to it Use second NICs at hosts for circuit connectivity leaving primary NIC for Internet access Connectionless Internet End host I End host II Circuit-Switched Network Attempt circuit setup If rejected, fall back to using TCP/IP Should we attempt a circuit setup for ALL file transfers? Two paths available 19 Expected delay on TCP/IP path Main factors: Round-Trip Time (RTT) main T prop Prob. of packet loss on IP path, p, Bottleneck link rate J. Padhye, V. Firoiu, D. Towsley, and J. Kurose, Modeling TCP Throughput: A Simple Model and its Empirical Validation, Proc. of ACM SIGCOMM 98, Aug Sep. 4, Vancouver Canada, pp Throughput B(p): approximately reciprocal of expected delay Other terms: W max : receiver window size b= 2 (ACK-every-other- segment) T 0 : initial time-out 20 Mean TCP delays 1.Input parameters plus the time to transfer a 1GB file and a 1TB file Loss Round- trip prop. delay Impact of propagation delay Low impact of bottleneck link rate in wide-area networks Impact of packet loss rate 21 Delays incurred in using an end-to-end circuit Circuit setup delay + File transfer delay m sig : message length; r s : signaling link rate Loads: sig and sp : sig. link and processor T sp : signaling protocol processing delay k: number of switches; T prop : r.t. prop. delay f: file size r c : circuit rate Acknowledgment: NSF ANI grant for hardware signaling 22 Should the application attempt a circuit setup or not? Mean delay if a circuit setup is attempted P b : call blocking probability in the circuit-switched network If circuit setup fails, fall back to Internet path 23 Routing decision 24 Numerical results link rate = 1Gbps T prop = 0.1ms T prop = 50ms 25 When r c = 100Mbps and T prop = 0.1ms Crossover file sizes r c = 1Gbps, T prop = 0.1ms 26 Utilization considerations Example: in 50ms scenario, if we transfer a 100KB file over a 100Mbps path, transfer time is only 8ms. Circuit utilization is 8/(50+8) = 13.7% Two opposing factors If the crossover file size (beyond which circuit setup is attempted) is increased per-circuit utilization increases traffic load decreases (Pareto distribution of file sizes), which means aggregate utilization decreases 27 Aggregate utilization u a : traffic load m: number of circuits P b : call blocking probability m uaua For a 1% call blocking probability P b = % 58.2% 84.6% Assuming file size follows Pareto distribution Define fractional offered load (fraction of ) 40KB81% 330KB71% 80MB51% 28 Plot of utilization u with r c = 100Mbps, k=20 P b =0.3P b =0.01 29 Research problems What are the research problems? What happens if the call gets blocked? Bandwidth sharing modes Low load performance Scheduled vs. immediate-request Long paths vs. short paths Mismatch between multitasking end hosts and TDM circuits 30 Connection-oriented bandwidth sharing for file transfers This is a new problem Real-time (interactive) audio-video applications generate data at a certain rate (constant or variable) implication: application requests the required bandwidth from the network, and answer is binary (accept or reject); multiple classes File transfers: any bandwidth that the network can provide could be acceptable implication: application requests a MAX bandwidth, but the answer can be multi-level 31 Fixing the bandwidth for the transfer could be a bad thing: low load problem Varying bandwidth list scheduling algorithm uses knowledge of file size to make varying bandwidth allocations for transfer catch: requires circuit switches to be reprogrammed multiple times within lifetime of a transfer (circuit) Capacity C Packet Switch N 2 3 Each transfer gets C/N capacity N The lone remaining transfer enjoys full capacity C Capacity C Circuit Switch N 2 3 Each transfer is allocated C/N capacity N The lone remaining transfer continues with capacity allocation C/N 32 Scheduled vs. immediate-request calls Session type requests: long holding times (2 hours) specific rate remote visualizations scientists participate in sessions best served with an advance reservation File transfer requests: file sizes provided not holding times max rate specified but any rate can be allocated scientists not involved; just computers Small files (e.g. 1 GB on 1 Gbps takes 8 sec) should be handled in immediate-request mode Large files (e.g. 1 TB on 1 Gbps takes 2.2 hours) should be handled in scheduled mode should we allocate 10Gbps and finish in 800 sec? immediate-request? or scheduled? 33 Specific research activities Scheduling Design and compare algorithms for scheduling session- type and large file transfer requests Use preemption and repositioning of large file transfer requests Immediate-request call admission algorithms Use Markov Decision Process (MDP) tools to balance fairness and overall throughput Long-path and short-path calls Large files (high-BW) and short files (low-BW) calls Multi-level answer rather than binary accept/reject Both with Fixed bandwidth and Varying bandwidth 34 Research problems What are the research problems? What happens if the call gets blocked? Bandwidth sharing modes Low load performance Scheduled vs. immediate-request Long paths vs. short paths Mismatch between multitasking end hosts and TDM circuits 35 Mismatch between multitasking end hosts and TDM circuits Variability in sender: other processes (e.g. matlab) + disk access (disk head location) Variability in receiver: if buffer not emptied out, data loss occurs Network protocols network card network card Filesystem File transfer Matlab kernel user space Circuit-switched network Network protocols Filesystem File transfer Matlab 36 Effects of mismatch in nature of circuits and nature of hosts Choose a high circuit rate and receive buffer can overflow causing losses impacts delay + utilization (retransmissions) Choose a low circuit rate and delay can be high If sending rate is not matched exactly with circuit rate circuit lies idle; utilization impacted 37 Fixed Rate Transport Protocol (FRTP) Set up a circuit at a carefully chosen rate Send data at that rate hard to meter out data at a fixed rate from a multitasking sender when that rate is high (Linux system time granularity: 10ms) No changes of sending rate i.e., no flow control or congestion control Packet losses recovered through retransmissions no timers needed, just negative ACKs because of in-sequence delivery 38 Experimental results RELATIVE TRANSFER DELAY CIRCUIT UTILIZATION (%) CIRCUIT RATE (Mbps) 39 Current work Experimenting with RT schedulers to schedule file transfer task in a set rhythm Experimenting with file systems to characterize file write time to collect data to then determine circuit rate and receive buffer size 40 Back to the outline Outline Problem statement CHEETAH: an NSF-funded experimental project Research problems The internet name in the title 41 Connection-oriented networks Many flavors Circuit switched Time Division Multiplexed (SONET) Equipment vendors: Sycamore, Ciena Network: Cheetah, UltraScience Net, CA*net 4 Wavelength Division Multiplexed (WDM) Equipment vendors: Movaz, Calient, LambdaOptical Network: Dragon, OMNInet, Internet2 HOPI 42 Connection-oriented networks Many flavors Packet switched Multiprotocol Label Switching (MPLS) Equipment vendors: Cisco, Juniper Network: Internet2, ESnet Virtual Local Area Network (VLAN) Equipment vendors: Dell, Intel, Foundry, Extreme Network: Enterprise local area networks Just need to enable connection-oriented network through already deployed boxes 43 Bandwidth sharing problem in heterogeneous network Problem: Tradeoff of fairness and utilization becomes more difficult when these crossconnect granularities are considered a c d b e f 2, 30Mbps 5, 100Mbps 1, 150Mbps 1, 10Mbps 1, 50Mbps 1, 500Mbps 2, 50Mbps 1, 50Mbps Request for 30Mbps connection 1Mbps 51Mbps 10Gbps Switch granularity 44 Interconnecting these networks Tricky business! Involves many levels of interworking protocols User (data) plane Signaling protocols (for connection setup/release) Routing protocols (for reachability, topology, loading data dissemination) 45 But We need to solve this internetworking problem for a true connection-oriented service to flourish! Acknowledgment: DOE grant 46 Summary Rich new set of research problems Experimental challenges a plenty! Real opportunity to deploy a network Web site: