1 Resource Optimization in Hybrid Core Networks with 100G Links Malathi Veeraraghavan University of…
1 Experiences with setting up the CHEETAH network Xuan Zheng, Xiangfei Zhu, Malathi Veeraraghavan...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of 1 Experiences with setting up the CHEETAH network Xuan Zheng, Xiangfei Zhu, Malathi Veeraraghavan...
1
Experiences with setting
up the CHEETAH network
Xuan Zheng, Xiangfei Zhu, Malathi VeeraraghavanUniversity of Virginia
2
CHEETAH network overview
3
Wide-area circuits OC-192 lambda from MCNC between MCNC
and NLR Raleigh PoP (Cisco 15454 MSTPs) OC-192 lambda from NLR between NLR Raleigh
PoP and NRL Atlanta PoP (Cisco 15808 MSTPs) OC-192 lambda from SLR between NLR Atlanta
PoP and SLR/Sox/Telx Atlanta PoP (Movaz) 2x1GbE MPLS tunnels from ONRL between
SLR/Telx Atlanta PoP and ORNL Will be upgraded to OC-192 lambda (Ciena
Corestream) Purchasing issues:
Wide-area circuits not “capital equipment”
4
CHEETAH nodes (circuit gateways)
Cisco ONS 15454 MSPP in original proposal Bought 10/100 Ethernet, GbE, and OC-3 interface cards
Watch out for the differences between the three types of Ethernet cards (E-series, ML-series, G-series)
Powerful TL1/CTC interfaces Does not support dynamic circuit setup through GMPLS.
Need external signaling software in order to be deployed in CHEETAH network (said support for UNI – but this is from client side)
Now being used in our lab with two Cisco GSR 12008 routers for local-area CHEETAH research
5
The circuit gateway SDM to TDM
More specifically, Ethernet/VLAN at user side to SONET at CHEETAH network side
Implement GFP and VCAT to map Ethernet signals into SONET signals in a an efficient manner
For example, 1Gbps -> 21xOC-1 Why GMPLS
Value of network grows exponentially with number of endpoints (Metcalfe's law)
CHEETAH network aims to a wide range of applications instead of just scientific applications.
Need distributed, Dynamic control-plane to build a scalable network.
6
CHEETAH nodes Sycamore SN16000 intelligent
optical switch GbE, 10GbE, and SONET interface
cards (we have OC192s) Networks/nodes are managed
through Silvx NMS, limited support for CLI and TL1 interface
Switch OS - BroadLeaf – implements GMPLS protocols since Release 7.0
Pure SONET circuits – excellent GMPLS signaling and routing support.
Ethernet-to-SONET GMPLS not officially released yet.
But proprietary solution works!
7
CHEETAH nodes Sycamore SN16000 intelligent optical
switch Before purchasing, a signaling interop. testing was
conducted between SN16000 and our RSVP-TE client software at Chelmsford, MA.
Three SN16000s were purchased without Silvx management software (would recommend getting Silvx NMS)
Promising future upgrade of signaling capability Ethernet-to-SONET is already available in unofficial patches LMP, L2SC, unidirectional circuits, CSPF, UNI, etc.
Purchasing issues: SONET switch equipment vendors’ contracts quite elaborate
Very different from data equipment Couldn’t reach agreement between UVA (state regulations) and
Sycamore Networks!
8
CHEETAH network connection
Three SN16000s were purchased and are located in MCNC, SLR/SoX/Telx Atlanta PoP, and ORNL.
SN16000s are interconnected by OC-192 circuits End hosts are connected to SN16000s’ GbE
interfaces by direct fiber, VLAN or MPLS tunnels. VLAN & MPLS tunnels not in original proposal But see a clear need for these - for costs reasons Testing shows packet loss/out-of-sequence on these
segments!
9
10GEthernetswitch
1G
NCSU
MCNC
Compute-0-4 152.48.249.6
Orbitty Compute Nodes
1G
OC192 OC192 GbE
1-8-33
1-8-34
1-8-35
1-8-36
1-6-1
1-6-17
Direct fibers
1-8-37
To Atlanta10G
Ethernetswitch
H
H
H
H
H1G
1G
1G
1G
1-7-1
Compute-0-3 152.48.249.5
Compute-0-2 152.48.249.4
Compute-0-1 152.48.249.3
Compute-0-0 152.48.249.2
1G
1G
1G
1G
wukong
H 152.48.249.102
1G1-8-38
1-7-17
VLAN connections
cheetah-nc
5Gbps VLAN
NC configuration – User Plane
10
OC192
1-6-1
1-6-17
10GbE
1-7-1
GbE
1-7-331-7-34
1-7-35
1-7-36
1-7-37
1-7-38
1-7-39
1GZelda1 10.0.0.11
H
H
H1G
1GZelda2 10.0.0.12
Zelda3 10.0.0.13
To NC
Juniperrouter
Zelda4 10.0.0.14H
H
Zelda5 10.0.0.15
Atlanta
ORNL2x1GbE
MPLS tunnels
GaTech
1G
1G
1G
1G
Direct fibers
To GaTech
1GbE
MPLS tunnels
Cheetah-atl
Juniperrouter
Atlanta configuration – User Plane
11
Things learned during node installation and maintenance
Rack installation Physical dimension: 19” or 23” width, height, depth? two-post or four-post rack? Not just the SONET gateway. Need space for other
equipment, such as PCs, Ethernet switches, console servers, PDU, etc.
Power supply Need careful power calculations – allow for growth DC power for switch: voltage, current, etc. AC for other equipment: current, connector type, etc. Remote power management
Need the capability to remotely (through the Internet) power cycle equipment.
Switched PDU (Power Distribution Unit) for AC power
12
Things learned during the node installation and maintenance
Remote management Internet access Console server
Connect the serial port on SN16000s to PCs to allow remote management when Ethernet management port on a SN16000 is down
Need better specification of remote manual support from collocation service providers
Network security Protect the switch from Internet attacks Provide the integrity and authentication of control-plane traffic Our solution:
Juniper Netscreen-5XT firewall/VPN server (hardware) for SN16000s Openswan software for Linux end hosts. CHEETAH control-plane traffic is protected by IPsec tunnels These tunnels allow for the use of private IP addresses behind the
firewall device Other considerations
Network measurement Need Ethernet hub (old style) or high-end Ethernet switch
13
Things learned for CHEETAH end hosts
End hosts CPU: frequency, single or dual, cache size Bus speed Memory: size, speed Disk: volume, speed, SATA or SCSI, Raid GbE Interface Card (NIC)
Optical or copper Bus type: PCI, PCI-X Connector: SC, LC Protocol: 1000Base-T, SX, LX (need to match the protocol of
Ethernet interfaces on CHEETAH nodes) Operating system: Windows, Linux, Unix, etc.
Linux kernel version Software
Security software: VPN client, ssh/ssl library Development tools: gcc compiler Network tools: Iperf
14
Control-plane design More difficult than user-plane design
Must consider requirements of GMPLS protocols, security, robustness
DCC-inband or out-of-band signaling? Must do out-of-band because of end hosts
involved. Security
Control-plane traffic must be protected Private or public IP addresses
Part of the routing domain on the Internet or endpoints?
15
NCSU
MCNC
Compute-0-0 128.109.45.160
Orbitty Compute Nodes
OC192 OC192 GbE
1-6-1
1-6-17
H
H
H
H
H
1-7-1
Compute-0-1 128.109.45.161
Compute-0-2 128.109.45.162
Compute-0-3 128.109.45.163
Compute-0-4 128.109.45.164
wukong
H 128.109.34.20
1-7-17
Ethernet port
128.109.34.18
192.168.4.2
Internet
Internet
Control
NS-5
1-8-33
1-8-34
1-8-35
1-8-36
1-8-37
1-8-38
128.109.34.19
Internet
Internet
cheetah-nc
1Gbps TE link
OC-192 TE link to ORNL16K1
Router ID=switch IP =192.168.5.1
5 x 1Gbps TE linkto orbitty
NC configuration – Control Plane
16
OC192
1-6-1
1-6-17
10GbE
1-7-1
GbE
1-7-331-7-34
1-7-35
1-7-36
1-7-37
1-7-38
1-7-39
Zelda1 130.207.252.131
H
H
H
Zelda2 130.207.252.132
Zelda3 130.207.252.133
Zelda4 10.1.1.4H
H Zelda5 10.1.1.5
Atlanta
ORNL
Control
NS-5
Ethernet port
130.207.252.138
130.207.252.136
192.168.2.2
NS-5198.124.42.3
3x1Gbps TE link
Internet
Internet
OC-192 TE link to cheetah-nc
Cheetah-atlRouter ID=switch IP=192.168.3.1
1Gbps TE link
Atlanta configuration – Control Plane
18
Outline Optical Network signaling overview Sycamore SN16000 and Cisco
15454 End host software for GMPLS
signaling External GMPLS signaling Engine
for Cisco 15454 Demo Conclusion and future work
19
Optical Network Signaling GMPLS: Work in progress at IETF
RFC2205 – RSVP for IP network RFC3209 – RSVP-TE for MPLS RFC3471 & RFC3473 – RSVP-TE for GMPLS RFC3946 – SONET and SDH
Optical Internetworking Forum (OIF) UNI, I-NNI, E-NNI
International Telecommunications Union (ITU) G.8080 (G.ASON)
20
Vendor Support to GMPLS Some vendors provide varying-level
support to GMPLS in their products E.g.: CIENA CoreDirector, Sycamore
SN16000, etc. Successful multi-vendor GMPLS
interoperability demos at ISOCORE and Supercomputing 05. The implementation of GMPLS
signaling by different vendors are basically compatible
21
University GMPLS Code KOM RSVP Engine – Technische
Universitat Darmstadt [7]
partial support of RFC 2205, 2210 & 3209
Dragon RSVP-TE code – MAX/ISI[4]
partial support of RFC 3471, 3473 & 3946
Integrate with OSPF-TE
22
Work at UVA Implement a GMPLS software for end
host – End host RSVP-TE client Integrating GMPLS signaling with
Admission control OCS (Optical Connectivity Service) (Not fully
done)
Integrate with 15454 control software Interoperability test with Sycamore
SN16000 Add support to CAC and route
computation to VLSR (work at CUNY)
23
SN16000: Optical Control Plane Features
GMPLS RFCs and Drafts RFC 3471: GMPLS Signaling Functional Description RFC 3473: GMPLS Signaling RSVP-TE Extensions (Draft) Routing Extensions in Support of GMPLS (Draft) OSPF Extensions in Support of GMPLS (Draft) Generalized Multi-Protocol Label Switching
Architecture (Draft) GMPLS Extensions for SONET and SDH Control (Draft) Framework for GMPLS-based Control of SDH/SONET
Networks In-fiber (in-band) and Out of fiber (out of band)
Control Plane Fiber-Switch Capable Support
enables communication w/FSC devices, (in addition to TDM devices)
OIF UNI 1.0 Slide from Sycamore
24
Cisco 15454 Cisco SONET Multiservice
Provisioning Platform (MSPP) Doesn’t support GMPLS signaling
25
End host Context
Routing decision: Decide use CHEETAH circuit or Internet to transfer the file base on the Internet congestion status and file size
FRTP: Fixed-Rate Transport Protocol designed for circuit-switched network[6]
RSVP-TE client: Dynamic provision of the circuit
InternetPC
NIC I
PC
FRTP FRTP
RSVP-TE Client
FTP
TCP/IP TCP/IP
FTPRouting decision
Routing decision
NIC II NIC IICHEETAH
Network
NIC I
[3]
RSVP-TEClient
26
End host RSVP-TE Client Software Architecture
bwmgr
SocketInterface
OCS
Control-planeAddress, Edge-switch Address,Link type, ...
Configure File
CAC
RSVPD
bwrequestor
Routing Decision
Application
End Host
RSVP_API
RSVP ClientProcess Pool
RSVPReceiver
OCS
FRTP
bwlib
bwlib
27
bwrequestor
Command for end users to request a circuit. bwrequestor DESTINATION-DOMAIN-NAME
BANDWIDTH
bwmgr
SocketInterface
OCS
Control-planeAddress, Edge-switch Address,Link type, ...
Configure File
CAC
RSVPD
bwrequestor
Routing Decision
Application
End Host
RSVP_API
RSVP ClientProcess Pool
RSVPReceiver
OCS
FRTP
bwlib
bwlib
28
bwmgr Daemon
Read the configuration from a configuration file.
Initiate circuit setup. Accept the circuit request Check if destination is in
CHEETAH network (OCS) Bandwidth management (CAC) Create RSVP session and send
out PATH message Update ARP/IP table for data-plane
Accept the circuit setup requests Register a default session with the
RSVPD and listen to PATH message
If it is a new session Fork a new process, create a
new session, update CAC table, and send back RESV message
Update ARP/IP table for data-plane
bwmgr
SocketInterface
OCS
Control-planeAddress, Edge-switch Address,Link type, ...
Configure File
CAC
RSVPD
bwrequestor
Routing Decision
Application
End Host
RSVP_API
RSVP ClientProcess Pool
RSVPReceiver
OCS
FRTP
bwlib
bwlib
29
Bwmgr Configuration File
The configuration file includes: The control-plane address of the node The address of the edge switch the node is connected to TE-link information:
Local data-plane interface Link type (Ethernet/SONET) and bandwidth Interface types (numbered or unnumbered) of the two
interfaces (local and remote) IP (numbered interface) / IFID (unnumbered) of each
interface A sample configuration file
CTRL-PLANE-IP = 130.207.252.131EDGE-ROUTER-IP = 192.168.2.2# TE-Links# TELink data-plane interface link type (0-Ethernet, 1-SONET) bandwidth(unit: Mbit) local interface type (0-unnumbered, 1-numbered) local interface IP/ID remote interface type remote interface IP/IDTELink eth2 0 1000 0 1 0 1
30
Bwmgr Library Provide two interfaces:
BWRequest() – setup circuits BWTeardown() – teardown circuits
Can be easily integrated with user applications
31
External GMPLS Engine for Equipment without GMPLS Capability
Dragon’s VLSR (Virtual Label Switching Router)[4] as an external GMPLS engine. RSVP-TE message parsing and construction Fabric programming module for some Ethernet switches
through SNMP Adopt VLSR for Cisco 15454
Monfox TL1 Library Provides an interface for an external program to provision
circuits by issuing TL1 commands to 15454 Difficulty: Library in Java while the Dragon code is in C++
Figured out how to integrate Java code with C++ through CNI (Cygnus Native Interface) (by Lingling Cui)
Integrate Dragon’s RSVP-TE software with 15454 control software
32
External GMPLS Engine for CISCO 15454
15454Controller(Monfox
DynamicTL1)
Control Plane
PC
Cisco-15454
RSVPD
TL1
VLSR
Control Plane
Data Plane Data Plane
PATH
RESV
33
OC192 GbE
1-6-1
1-6-17
Ethernet port
192.168.4.2
Internet
Internet
Control
NS-5
1-8-33
1-8-34
1-8-35
1-8-36
1-8-37
1-8-38
OC192
1-6-1
1-6-17
10GbE
1-7-1
GbE
1-7-331-7-34
1-7-35
1-7-36
1-7-37
1-7-38
1-7-39
1GZelda1
H
H
H1G
1GZelda2
Zelda3
Juniperrouter
Atlanta
1G
1G
1G
H Control-Plane: 128.109.34.20Data-Plane152.48.249.102
DemoControl
NS-5Interne
t
Internet
To ORNL
To Orbitty
Wukong
192.168.2.2
Ethernet port
Control-Plane: 130.207.252.131
Data-Plane10.0.0.11
34
Performance Average end-to-end setup delay is
around 4.5 seconds Detailed delay has not being
measured
35
Experiment with Cisco 15454
15454CONTROLLER
(MonfoxDynamicTL1)
PC
CISCO-15454
RSVPD
TL1
VLSR
PC I
NIC I
NIC II
PC II
NIC I
NIC II
36
Performance Performance of external GMPLS
engine for MSPP[5] Time for crossconnection setup:
STS-1: 17.833 ± 0.184 ms STS-3: 18.000 ± 0.081 ms
Time for crossconnection delete: STS-1: 16.400 ± 0.175 ms STS-3: 16.300 ± 0.145 ms
37
Conclusion It is feasible to extend dynamic
circuits to end hosts by running RSVP-TE software on end hosts
It is feasible to add GMPLS signaling capability to devices without build-in GMPLS capability
The standards are mature and vendor implementation is good
38
Future Work Development part
Alpha and Beta test Hand out to scientists to use Finish the developing of VLSR at CUNY
Research part Bandwidth scheduling of circuit-
switched network Immediate call vs. scheduled call Distributed bandwidth scheduling
23/4/18 39
Thank you!
40
Reference[1] http://cheetah.cs.virginia.edu/[2] CHEETAH overview, John H. Moore, Xuan Zheng,
Malathi Veeraraghavan, http://cheetah.cs.virginia.edu/networks/Cheetah%20Overview.jpg
[3] CHEETAH network, Malathi Veeraraghavan, Nagi Rao, July 7, 2004
[4] http://dragon.east.isi.edu/[5] External Switch Control Software, Lingling Cui,
CHEETAH project year 1 demo, September 01, 2004[6] X. Zheng, A. P. Mudambi, and M. Veeraraghavan,
FRTP: Fixed Rate Transport Protocol -- A modified version of SABUL for end-to-end circuits, Pathnets2004 on Broadnet2004, Sept. 2004, San Jose, CA
[7] KOM RSVP Engine, http://www.kom.e-technik.tu-darmstadt.de/rsvp/
[8] Monfox DynamicTL1 SDK, http://www.monfox.com/dtl1_sdk.html
41
Acronym CHEETAH – Circuit-switched High-speed End-to-
End Transport ArcHitecture RSVP – Resource Reservation Protocol RSVP-TE – RSVP – Traffic Engineering GMPLS – Generalized Multiple Protocol Label
Switching SONET – Synchronous Optical NETwork SDH – Synchronous Digital Hierarchy IETF – Internet Engineering Task Force RFC – Requests for Comments UNI – User-Network Interface I-NNI – Internal-Network-Network Interface E-NNI – External-Network-Network Interface
23/4/18 42
Transport protocol for dedicated circuits
43
Outline Transport protocol functionality Requirements for dedicated
circuits User-space implementation Kernel-space implementation
44
Transport protocols End-to-end functionality falls in
transport layer’s domain Error control Congestion control Flow control
45
Error control End-to-end reliability; no missing data,
no reordering, no duplicates Data packets get lost (thrown away due
to lack of buffer space) Bit errors cause packet corruption in
transit Error control: Infer that something is
wrong and take steps to correct it. Retransmit missing packets, buffer out
of order data until the missing data arrives, suppress duplicates
46
Congestion control Avoid or reduce the damage due to
buffers in the network overflowing Inferring that network is congested: # explicit signals from the node
where buffer overflows # guess from indirect signals like
packet loss, RTT variation To reduce congestion slow the rate
of putting packets into the network
47
Flow control Avoid over-running the receiver If receiver is the bottleneck
maintain a sending rate that matches the receiver rate
Receiver signals its willingness to accept more data
Sender sends data if receiver is ready
48
Transport protocol for dedicated circuits Error control: Errors can occur so error
control is needed Congestion control: Bandwidth reserved
in network. No congestion if sender always sends below reserved circuit rate.
Explicit congestion signals: will never come so no problem; infer congestion from packet loss: loss due to other reasons wrongly assumed to signify congestion
Flow control: Ensure no receiver over-run so flow control is required
49
User-space implementation UDP: Minimal transport protocol.
No error/congestion/flow control UDP: Interface to the IP layer. Low
overhead Perfect for adding extra
functionality as needed Many variations of UDP-based
protocols: SABUL, Hurricane, Tsunami, UDT …
50
User-space implementation We took SABUL and modified it for
our needs Simple Available Bandwidth
Utilization Library Uses UDP for the data transfer Adds reliability, congestion/flow
control by using a control channel that runs over TCP
51
SABUL protocol
Error control Sequence numbers added to UDP
packets On getting out of order packets
receiver sends a NAK control packet Sender keeps unacknowledged
packets in memory. Retransmits missing packets on getting a NAK
Receiver sends periodic ACK to allow sender to free buffer space
52
SABUL protocol
Congestion control / Flow control Receiver sends a SYN control
packet periodically with number of packets received since last SYN
Sender uses this information to estimate loss rate.
Based on loss rate adjust sending rate
Sending rate adjusted by adjusting inter-packet gap
53
SABUL implementation Implemented in C++ Provides an interface that is similar to
the socket interface: connect, send, receive
1 thread for the application generating data to send (consuming received data)
1 thread to send (receive) the data and handle the control channel
Accurate inter-packet gap maintained by busy-waiting
54
SABUL FRTP Stripped out the rate alteration
part to avoid leaving the reserved circuit idle
Added support to use the secondary NIC for the data channel
Ran experiments to figure out the optimum values for the many parameters available for tweaking
55
Drawbacks of SABUL Flow control: No way to prevent
receiver buffer overflow. Use of busy waiting to maintain
inter-packet gap: Uses up CPU cycles and makes the sending process sensitive to other processes
Error control efficiency: Many packets were retransmitted unnecessarily
56
Kernel-space implementation Flow control is essential because the
receiver cannot guarantee that the buffer will be emptied at the rate it is filled
TCP has window-based flow control TCP’s self-clocking can provide a more
efficient way of maintaining a sending rate
Interrupts from the NIC allow new packets to be put on the network; scheduler has less impact on the packet transmission
57
Kernel-space implementation TCP congestion control is not required
though Congestion window cwnd: amount of
data that network can sustain Receiver advertised window rwnd:
amount of data the receiver has buffer space for
TCP sender tries to keep min (cwnd, rwnd) of outstanding data in the network
Congestion control : a way to estimate the value of cwnd
58
TCP congestion control Slow start: start with cwnd set to a
small value (2-4 packets) and increment by 2 packets for every ACK received
Congestion avoidance: when cwnd crosses a threshold increment it by 1 packet every RTT
On inferring loss (triple duplicate ACKs or timeout) cut cwnd to half
59
FRTP: TCP-based version With a dedicated circuit the amount of
data network can sustain (what cwnd represents) is fixed and known.
Slow start serves to get the cwnd close to the actual network capacity; not required
Additive increase of congestion avoidance and multiplicative decrease of loss recovery not required
60
FRTP implementation We use the Web100 TCP stack
which provides an interface to modify/set some TCP parameters
Added 2 new control parameters One to select whether the TCP
socket is connected to a CHEETAH circuit
Second to provide the value to which cwnd is to be set. This value is the Bandwidth Delay Product BDP
61
FRTP utility For small to medium sized transfers the
gains of avoiding slow start are substantial Lesser advantage as transfer size
increases In maintaining just BDP amount of
outstanding data FRTP might get lower throughput if the sending process goes silent
The switches may not have enough extra data in their buffers to cover for these silences
62
File transfer experiment Tested different file transfer
applications over the CHEETAH testbed ORNL, Tn SOX/SLR PoP, Atlanta, Ga
Centaur lab, NCSU, NC
Host compute-0-0 zelda3 zelda4 wukong
Location NCSU Atlanta ORNL MCNC
CPU Dual 2.4 GHz Xeon
Dual 2.8 GHz Xeon
Dual 2.8 GHz Xeon
2.8 GHz Xeon
Memory 2 GB 2 GB 2 GB 1 GB
Disk SCSI 3x10K rpm RAID0
SCSI 2x10K rpm RAID0
SCSI 2x10K rpm RAID0
SCSI 2x15K rpm RAID0
Kernel 2.6.11 2.4.21 2.4.21 2.6.9
File system XFS EXT3 EXT3 EXT3
63
File transfer experiment : Results
Among FTP, SFTP, SABUL, Hurricane, BBCP and FRTPv1 FTP seen to have best performance
FTP gets highest throughput and is most consistent
Send and receive buffers have to be tuned for best performance