AARNet's experiences using MPLS for protectiongdt/presentations/2002-07-28-i2-mpls/mpls.pdf · MPLS...
Transcript of AARNet's experiences using MPLS for protectiongdt/presentations/2002-07-28-i2-mpls/mpls.pdf · MPLS...
AARNet's experiences using MPLS for protection
Glen Turner, Network [email protected]
Internet2/NLANR Joint Techs MeetingBoulder, CO, USA
2002-07-28
Australian Academic & Research Networkhttp://www.aarnet.edu.au/
Topics
MPLS overview
Protection technology
AARNet's experiences with MPLS
Other interesting stuff if we have time
Coverage
MPLS is a big topic with multiple implementation choices at almost every turn
Only discuss some of the technolgy choices● MPLS generic tagging, not ATM tagging● RSVP and not LDP● OSPF and not IS-IS
Coverage
Discuss the use of MPLS for protection, not discussing some important uses of MPLS● VPNs (and thus BGP)● GMPLS, the integrated control layer for
switching technologies
“How to speak Australian”● words with “or” à “our”, “z” à “s”
SONET à SDH (slight framing difference)
T1 à E1 (E1 is 2Mbps)
MPLS aims
Scalable IP traffic engineering● Avoid need for full IP network knowledge
at core
Virtual private network service● By providing label switch paths exclusive
to a customer
This presentation focuses on traffic engineering● Only beginning to experiment with VPNs
MPLS is a layer 2½ protocol
Presentation
Session
Transport
Network
Link
Physical
Application
Presentation
Session
Transport
Network
Link
Physical
Application
MPLS
1
2
3
4
5
6
7
1
2
3
4
5
6
7
2½
Advantages of layer 2½
No complex next hop algorithm● IP address lookup is expensive
– Closest matching prefix versus table lookup● IP next hop algorithm gets more complex
with each new service– Policy routing– Multicast
Want GbE switch prices not GbE router prices
New behaviours only effect edge routers
Advantages of layer 2½, cond
No need to follow IP routing● The shortest path may not be the best
path● Want policy
– For traffic engineering● Bandwidth● Diverse routers and paths
– For arbitrary customer requirements● eg: Australian Army doesn't want to be routed over
links not owned by Australian-controlled telcos
Advantages of layer 2½, cond
Why MPLS for policy and not BGP?● BGP is globally visible
– Scalability: Does outer Mongolia need to know of an interface failure in outback Australia?
– Can lose connectivity due to dampening, which is essential due to global visibility
● Not all reasonable policies can be expressed in BGP
Disadvantages of layer 2½
Another set of control protocols● ATM: OAM, ILMI, PNNI● 802.1Q VLANs: Virtual LAN reservation
protocol● SDH/SONET
MPLS uses IP as its control and routing protocol
Layer 2.5 and protection
Network layer protection requires a network-layer repsonse● Limited by convergence time of routing
protocol● Fast convergence and global visibility do
not mix– BGP rate limiting is an expression of this
Layer 2.5 and protection
Link layer protection requires a link layer response● These often have constrained topologies
– SDH/SONET rings– 802.1D and parallel links
● They often inefficiently use protection bandwidth
● They often treat all network traffic as equally valuable
● Lack of network topology: poor decisions
Layer 2.5 and protection
Allow network layer to establish pre-routed fallback path● Full topology awareness
Allow link layer to switch to fallback path● Not globally visible● Fast convergence
This could get messy upon multiple failures● Run interior routing protocol afterwards
Forwarding equivalence class
Another view of IP routing● Step 1: Determine forwarding equivalence
class from IP header (or more)– Standard: Destination IP address– Advanced: source IP address, multicast group,
DSCP, TCP port, increasing bizaare● Step 2: Lookup FEC forwarding table to
determine output interface (ie: switch the packet)
Forwarding equivalence class, cond
IP router calculates forwarding equivalence class at every hop● Expensive
– either in CPU time or hardware● Extensive
– IP forwarding table is big with frequent updates
● Difficult to alter for new behaviours– ASIC designers may have not anticipated the
change (reverse path lookup, source-specific multicast)
Forwarding equivalence class, cond
MPLS switching● Determine forwarding equivalence class at
ingress● Tag packet with a fixed-length label for
this forwarding equivalence class● Switch using the label at every other hop
to egress– Tags are designed for hardware manipulation
Labels are not globally unique
Even one router can run multiple “label spaces”
– eth0, eth1 in LS1– eth2, eth3 in LS2
Edge routers need distinct IP routing tables for each label space● The key to MPLS VPNs● We often want multiple routing tables and
settle for policy routing instead
MPLS tag
A 32-bit header in front of the packet
Tag contains just enough information for forwarding and queuing● Unlike IPv4/IPv6 header, which carries a
lot more
Tag has hardware-friendly structure
MPLS tag, fields
Label● Determines next-hop interface
Experimental (QoS)● Determines output interface queuing
S for “last of stack”● S=1 on last header
Time to live● Discard upon zero, otherwise decrement
MPLS tag, stacking
An MPLS tagged packet can be tagged again (“stacked”)● Allows Provider-Provider connections to
maintain customer tags● Simplifies design considerably● Avoids need for global label space
Network-layer packetTagS=0
TagS=1
TagS=0
MPLS tag, stacking and MTU
The tag may reduce the size of the path maximum transmission unit (PMTU)● TCP/IP stacks don't cope well with change
of PMTU– PMTU at establishment of TCP determines TCP
MSS● Best to ensure that main and protect
paths have identical tag depths
Or may not, if the link layer will let us flex the rules
MPLS operation, condLabel switch router
Incoming packet, look up incoming label map, which contains● Incoming label● MPLS opcode: PUSH, POP, etc● Forwarding equivalence class● Link to outgoing next hop label entry
MPLS operation, condLabel switch router
Incoming packet operations● Extract label from top tag● Lookup incoming label map● Execute MPLS opcodes to manipulate tags● Forward packet to outgoing processing
MPLS operation, condLabel switch router
Outgoing packet, look up next hop label entry, which contains● Outgoing label● Outgoing interface● Perhaps, outgoing per-hop queuing
behaviour
MPLS operation, condLabel switch router
Outgoing packet operations● Look up next hop label entry● Create new tag containing outgoing label● PUSH tag onto label stack● Add to transmit queue on outgoing
interface– queuing discipline may depend upon
● Value in next hop forwarding entry● Value determined from Exp bits, a lá IP DSCP and
weighted fair queuing + RED
MPLS operation, condIngress label edge router
Incoming packet, look up forwarding equivance class to next hop label entry (FTN), which contains● forwarding equivalence class● next hop label entry
MPLS operation, condIngress label edge router
Incoming packet operations● Determine forwarding equivalence class
using “standard” IP forwarding– Basic: lookup destination IP address in IP
forwarding table– Advanced: policy routing, multicast routing,
QoS routing, ...● Use FEC to lookup forwarding equivalence
class to next hop label entry table● Process next hop label entry
MPLS operation, condEgress label edge router
Next hop label entry shows this router as the penultimate hop
Protocol-dependent actions to simulate label switch routers being real routers● Decrement IP TTL● Generate any ICMP which would have
occurred
Forward the packet using the standard IP algorithm
Faking ICMP gives interesting results
Traceroute from Glen's home to www.internet2.edu
1 sadial.sa.csiro.au 119.657 ms 129.673 ms 100.004 ms 2 sa.gw.csiro.au 119.944 ms 129.829 ms 110.382 ms 3 lis255.atm1-0.central.saard.net 131.917 ms 119.858 ms 109.980 ms 4 sa-nsw.atm.net.aarnet.edu.au 139.715 ms 149.829 ms 140.002 ms 5 vlan916.gbe3-0.sccn1.broadway.aarnet.net.au 149.941 ms 149.773 ms 149.968 ms 6 pos1-0.sccn1.manoa.aarnet.net.au 349.907 ms 279.791 ms 289.963 ms 7 pos2-0.sccn1.seattle.aarnet.net.au 279.866 ms 329.880 ms 279.904 ms 8 Abilene-PWAVE.pnw-gigapop.net 279.870 ms 351.155 ms 328.555 ms 9 dnvr-sttl.abilene.ucaid.edu 339.933 ms 339.861 ms 329.944 ms10 kscy-dnvr.abilene.ucaid.edu 349.847 ms 339.622 ms 350.053 ms11 ipls-kscy.abilene.ucaid.edu 339.756 ms 339.932 ms 339.903 ms12 clev-ipls.abilene.ucaid.edu 339.884 ms 349.808 ms 339.963 ms13 nycm-clev.abilene.ucaid.edu 349.752 ms 349.857 ms 339.969 ms14 border-abilene-oc3.advanced.org 360.135 ms 359.857 ms 379.851 ms15 www.internet2.edu 379.865 ms 359.838 ms 359.950 ms
Architectural issues
There is a lot of complexity at the edge● Especially in the egress router
But we want the edge to be cheap, as there is a lot of it
There are no MPLS applications
ATM has applications● (Today's bizaare but true fact)
Links between 3G base stations and switching points is the most recent application to treat ATM as a transport layer
Even ethernet has applications● DEC Local Area Transport
There are no MPLS applications
MPLS exists only to carry other protocols● The label edge routers must support the
protocol● This isn't new
– All routers have to support the network layer protocol they are routing
Model is strained somewhat by abuse of MPLS to carry ethernet frames
Configuring a label switch routerLinux
Both eth0 and eth1 in label space 1● mplsadm -L eth0:1mplsadm -L eth1:1
Configuring a label switch routerLinux
Configure label switching● mplsadm -A -I gen:10:1 -O gen:20:ipv4:10.3.0.2 -Bmplsadm -A -I gen:21:1 -O gen:11:ipv4:10.2.0.1 -B
– -A -B: add and bind
– -I: incoming on eth0, generic tag, label 10
– -O: outgoing on eth1, generic tag, label 20, only if next hop is available
Configuring a label edge routerLinux
Configuration for left-most router
Label space● mplsadm -L eth0:1mplsadm -L eth1:1
Configuring a label ingress router – Linux
Ingress label edge router
Set forwarding equivalence class in routing subsystem
● route add -net 10.4.0.0/16 gw 10.2.0.2
Set FEC in MPLS subsystem● mplsadm -A -B -O gen:10:eth0:ipv4:10.2.0.2 -f 10.4.0.0/16
– outgoing label of 10
Egress label edge router● mplsadm -A -I gen:11:1
● mplsadm -A -I gen:10:1 -O gen:20:ipv4:10.3.0.2 -Bmplsadm -A -I gen:21:1 -O gen:11:ipv4:10.2.0.1 -B
– -A -B: add and bind
– -I: incoming on eth0, generic tag, label 10
– -O: outgoing on eth1, generic tag, label 20, only if next hop is available
Configuring a label egress routerLinux
Egress label edge router
Incoming MPLS packets with label 11 are POPed and escalated to IP routing system
● mplsadm -A -I gen:11:1
Configuring a label edge routerLinux
Label space● mplsadm -L eth0:1mplsadm -L eth1:1
Ingress label edge router● Forwarding equivalence class is
determined by routing sub-system● route add 10.4.0.0/16 gw 10.2.0.2mplsadm -A -B -O gen:10:eth0:ipv4:10.2.0.2 -f 10.4.0.0/16
Egress label edge router● mplsadm -A -I gen:11:1
● mplsadm -A -I gen:10:1 -O gen:20:ipv4:10.3.0.2 -Bmplsadm -A -I gen:21:1 -O gen:11:ipv4:10.2.0.1 -B
– -A -B: add and bind
– -I: incoming on eth0, generic tag, label 10
– -O: outgoing on eth1, generic tag, label 20, only if next hop is available
Representation
How should MPLS look to the network layer?
The preceeding is not a good fit● eth0 has multiple subnets● eth0 can be partially down● Routing protocols need considerable work
Representation
A tunnel seems a good fit● Tunnels run between routers, making
intermediate routers invisible● Tunnels have MTU issues, as does MPLS● Routing protocols understand tunnels● Management systems expectations are
met– interface either down or up– SNMP counters count something useful
Configuring a label ingress router with tunnels – Linux
Create MPLS tagging● mplsadm -A -O gen:10:eth0:ipv4:10.3.0.2
– -A -O: add outgoing label● gen:10: generic tag with label 10● eth0: outgoing interface● ipv4:10.3.0.2: address of remote-end of tunnel
Configuring a label ingress router with tunnels – Linux
Create a tunnel interface● mplsadm -A -T mpls0
– -A -T: Add tunnel● mpls0: tunnel interface name
Configuring a label ingress router with tunnels – Linux
Assign an IP address to the local end of the tunnel, use the same address as the ethernet interface
● ifconfig eth0 inet addr:10.2.0.1ifconfig mpls0 10.2.0.1 netmask 255.255.255.255
– mpls0: tunnel interface to configure
– 10.2.0.1: local-end IPv4 address
Configuring a label ingress router with tunnels – Linux
Bind outgoing label to tunnel● mplsadm -B -O gen:10:eth0 -T mpls0
– -B -O: bind outgoing label● gen:10: generic tag with label 10● eth0: interface
– -T: tunnel● mpls0: tunnel interface name
Configuring a label ingress router with tunnels – Linux
Forward traffic to mpls0 tunnel● route add -net 10.4.0.0/16 gw 10.3.0.2 dev mpls0
– 10.4.0.0/16: Forwarding equivalence class
– gw 10.3.0.2: remote tunnel-end address
– dev mpls0: next hop interface
Configuring a label egress router with tunnels – Linux
Same as normal egress● mplsadm -A -I gen:11:1
Configure label edge routerLinux
Configure the rightmost label edge router similarly
We want to do this automatically● That is, to use a signalling protocol
IP-based signalling and routing
Unusual, most link technologies develop their own signalling and routing● Ethernet: bridge protocol data unit
– carries● 802.1D spanning tree● 802.1Q virtual LAN registration protocol
● ATM: OAM, ILMI and PNNI
Signalling
LDP: Label distribution protocol
RSVP: Resource reservation protocol
We'll only discuss RSVP
RSVP
A soft-state protocol for establishing and maintaining IntServ QoS paths● Sent Path message requests a IntServ
path● Received Resv message confirms a
IntServ path request
RSVP, cond
New RSVP objects for MPLS paths
Path mesage● LABEL_REQUEST: create a label switched
path● EXPLICIT_ROUTE: through these label
switch routers
Resv message● LABEL: Inserts entry into label switch
forwarding table
RSVP and traffic engineering
Sometimes don't want the shortest path● A longer congestion-free path is always
better than a shorter congested path● Bizaare customer requirements
– eg: ADF and links controlled by non-ANZUS telcos
● Diversity– Complex, as lots of failure modes
● Don't want to share core, cable, conduit, router, UPS, building, site, block, substation, road, flood plain, craft personnel, jurisdiction
RSVP and diversity
● RSVP has “resource affinities”, roughly 32 per label space– Enough for broad-brush use, say for a national
backbone– AARNet doesn't use this
● Our use of MPLS is either too trivial or too complex
RSVP and degraded service
RSVP has a Setup Priority and Holding Priority● These allow established paths to be pre-
empted by a new path● AARNet considering use for recovery
scenarios– So we can prioritise use of degraded capacity– eg: voice, commodity, research, quality video,
multicast
RSVP node failure
Hello protocol ● HELLO REQUEST● HELLO ACK
Detects● Node down● Node reboot
– Thus needs instant path re-establishment● All links between the two nodes have
failed
RSVP node failure, cond
No “alarm heirarchy” of Hellos● They run on every label switch path
Good● Alarm heirarchies often fail
– CPU overwhelmed by massive failure
Bad● Bandwidth and CPU interrupts● End-to-end, not segment-based
This won't do for GMPLS
Signalling configures path in one direction
Important that other direction be established :-)
It should follow the same physical segments● Balakrishnan, Padmanabhan, Fairhurst, et
alTCP performance implications of network path asymmetry– draft-ietf-pilc-asym-07
Requirements
We want to specify paths with● Forwarding equivalency class● Origin and destination node● Path placement constraints
So the routing protocol needs to distribute● Connectivity● Path attributes to satify constraint
calculations
Possible contraintsRouters
Support for prioritisation
Support for protocols
Available bandwidth
Link technologies
Protection switching technologies
Possible contraintsLinks
Available bandwidth
Reliability
Colour
Cost
Membership of shared link risk group
OSPF implementation
Add new link state advertisment types which contain link attributes
These LSAs should be ignored by standard OSPF – they are “oqaque”
There are three new Opaque LSAs, all identical except for flooding scope
Add an OSPF Hello option so neighbours can become Opaque LSA neighbours and pass Opaque LSAs
Structure of the opaque (huh?)
List of TLVs for routers and links● Type, length value● Allows un unsupported variable to be
silently ignored
Attributes are held in sub-TLVs● TLVs within TLVs
Routers sub-TLV● Router ID
Structure of the oqaque
Link TLV● Identity sub-TLVs
– Link type: point-to-point, multi-point– Router ID of neighbour– Local interface IP address– Remote interface IP address
Structure of the opaque
Link TLV● Traffic engineering sub-TLVs
– Traffic engineering metric, 32-bit cardinal– Maximum bandwidth, 32-bit floating point– Maximum reservable bandwidth, 32-bit
floating point– Unreserved bandwidth, 32-bit floating point– Resource colour, 32-bit mask
● A “colour” might be a DWDM channel, or a E1 time-slice within an E3, or a ...
Limitations - flooding
Traffic engineering values can change rapidly and repeatedly● Available bandwidth
Important to limit flooding
Opaque LSAs don't do this nearly as well as they could as there are only three flooding scopes
Limitations - summarisation
Difficult to summarise traffic engineering information
Thus areas are difficult to construct
But areas are vital in limiting flooding
Configuration – ZebraInterface control
zebra.conf● interface eth0 bandwidth 100000 description Link to LSR ip address 10.2.0.1/30
● interface eth1 description Hosts bandwidth 100000 ip address 10.1.0.1/16
● interface mpls0 description Tunnel bandwidth 100000 ip address 10.3.0.2/32 no multicast ipv6 nd suppress-ra
Configuration – ZebraOSPF router
ospfd.conf● router ospf ospf router-id 10.2.0.1 auto-cost reference-bandwidth 10000 area 0 authentication message-digest network 10.1.0.0/16 area 0 network 10.2.0.0/30 area 0 network 10.3.0.2/32 area 0 neighbor 10.3.0.2 capability opaque mpls-te mpls-te router-address 10.2.0.1
Configuration - ZebraOSPF interfaces
ospfd.conf● interface eth0 ip ospf network broadcast ip ospf authentication message-digest ip ospf message-digest-key ... md5 ... mpls-te link metric 0 mpls-te link max-bw 1e+07 mpls-te link max-rsv-bw 5e+06 mpls-te link rsc-clsclr 0x1
● interface eth1 ip ospf network broadcast ip ospf authentication message-digest ip ospf message-digest-key ... md5 ...
Configuration - ZebraOSPF tunnel interface
ospfd.conf● interface mpls0 ip ospf network point-to-point ip ospf authentication message-digest ip ospf message-digest-key ... md5 ...
MPLS is improving OSPF
Dynamic shortest path first algorithms– About 10% of full-DB Dijkstra
Hitless restart– Remove assumption OSPF comes up in
quiescent netwok
Graceful handing of failure– Database overflow– Rate limiting
● Especially of flapping interfaces
AARNet's load share configuration – South
● interface POS1/0 description Seattle-Sydney SDH ip address 192.231.212.34 255.255.255.252 ip ospf cost 128 mpls traffic-eng tunnels mpls traffic-eng backup-path Tunnel8204 tag-switching ip pos ais-shut pos report lrdi ip rsvp bandwidth 150000 150000 ...
AARNet's load share configuration – North
● interface POS2/0 description Seattle-Manoa SDH ip address 192.231.212.162 255.255.255.252 ip ospf cost 64 mpls traffic-eng tunnels mpls traffic-eng backup-path Tunnel 8203 tag-switching ip pos ais-shut pos report lrdi ip rsvp bandwidth 150000 150000
OSPF design hintsUse current best practice
● Small area 0, consistent with TE– Area 0 has total network knowledge
● Using areas allows address aggregation– Most importantly this aggregates network
state– Addressing needs to be thought out in
advanceSydneycore
Manoa
Seattle
Wollongongcore
Area0
Area1
Area2
OSPF design hints, cond
● Loopback interface as router ID– Make this a /32
● Broadcast and loop media has an advantage– Only two routers in subnet (DR and BDR)
track area state● Don't redistribute
– Use network statements– You'll end up with a lot of these so use a Perl
script● Use MD5 authentication
Fast re-routeBasic mechanism
Fast re-route● Detect fault using
– Link layer carrier loss– RSVP Hello timeout (150ms)
● Signal failure using RSVP ResvTear message
● Change to pre-established label switch path
● Recalculate optimal paths by running OSPF
RSVP messages
Path FAST_REROUTE● Request a path proected with a fast re-
route path
Path DETOUR● Request a fast re-route path
Two modes of operation
J: LSP oriented:● Establish an detour LSP to protect one
other LSP● Upon failure switch packets to the detour
LSP
Two modes of operation
C: Tunnel oriented● Establish a tunnel to protect other tunnels● Upon failure send the packets through the
tunnel– pushing onto the label stack
● One backup tunnel can protect many other tunnels
These don't interoperate. Ouch
OSPF run to clean up
A multiple failure may not lead to a sane topology
OSPF is run to route all active main and detour LSPs optimally
Need to rate limit how often this is done● else intermittent interface failures will use
more CPU than they deserve
Tunnel-style fast re-route
The main LSP● interface POS1/0 description Seattle-Sydney fiber ip address 192.231.212.34 255.255.255.252 mpls traffic-eng tunnels ! Seattle-Manoa protect mpls traffic-eng backup-path Tunnel8204 tag-switching ip pos ais-shut pos report lrdi ip rsvp bandwidth 150000 150000 ...
Tunnel-style fast re-route
The backup tunnel● interface Tunnel8204 description Seattle-Manoa backup ip unnumbered Loopback0 tag-switching ip ! Loopback0 on manoa tunnel destination 192.231.212.148 tunnel mode mpls traffic-eng tunnel mpls traffic-eng priority 0 0 tunnel mpls traffic-eng path-option 1 explicit name sea-haw ...
TopicAARNet's experiences
ConfigurationProtection
MeasurementProperties of international links
Future
“You are in one of a large number of tunnels, all seemingly
alike”The number of MPLS paths explodes
quickly
It took up some time and a lot of care to get all the tunnels established
Managing tunnels
Naming conventions
Only the beginning of automated tools● These tend to be proprietary rather than
general, and driven from a GUI rather than a database
Had to build a lot of our own tools● SNMP program to check all LSPs had
reverse LSP● Wanted to write more but insufficent
router MIBs
TopicAARNet's experiences
ConfigurationProtection
MeasurementProperties of international links
Future
Restoration
MPLS performance should be worse than SDH performance● MPLS is end-to-end protection and the link
latency is 80ms● SDH has section protection, longest
section is 40ms
Restoration
This was true in practice● Still not enough time for a phone user to
hang up● Too long to be used to switch routers in
and out of working path– Want to do this for software upgrades
Performance under stress
MPLS restoration was better behaved than SDH when things fell apart● AARNet's network management system
has a sophistication that the SDH systems do not have– This leverages off the work on monitoring
generic IP links● We could detect and isolate odd conditions
before they threatened service
SDH in practice
SDH alarms can overwhelm management console● Some vendors have poor isolation
between configuration and operation
Configuration errors are disturbingly common
No interlocks● Put main circuit into loopback● Put protect circuit into loopback
OSPF
Far too easy to cause OSPF-TE to fail● Flapping interfaces drove CPU to 100%● CPU then fails to generate OSPF
Neighbour Hellos● OSPF loses adjacencies● CPU returns to 0%● Repeat
OSPF, cond
Fixes● Obvious solution is to rate limit repeated
OSPF next state output where state inputs are the same– Router manufacturers have gone for simpler
variants of this, such as rate limiting all state changes
OSPF, cond
Fixes, cond● Dynamic alternatives to Dijkstra algorithm
– Run time depends on “importance” of lost link, not size of total database
– In practice, about 10% resources of standard algorithm
TopicAARNet's experiences
ConfigurationProtection
MeasurementProperties of international links
Future
Measurement
Traceroute and ping haven't been useful as performance-measuring tools since flow routing
MPLS nails coffin shut● It's all faked at egress router, probably on
slow path
Active measurement
Need to allow for parallel paths● Four adjacent IP addresses on measuring
platforms● Hashing will place these on differing paths
to the same destination
Need to use a fast-path protocol● Not ICMP
Be careful not to measure the measurement host
Active measurement
Loss● Indicates major fault or congestion
Latency● Indicates protection or misconfiguration
– Measurement system needs to know nominal latency for main and protect paths
SNMP
Needed to detect protection event
Needed to detect loss of protect path● RECOVERY
Service:Tunnel8194(BroadwaySeattle)BackupHost:SCCNBroadwayRouterAddress:162.231.212.20State:OKInterface:OK–1Date: Sat20Jul12:54:24.3
Useful for checking configuration● Each label switch path has a reverse path● Each main LSP has a protect LSP
MPLS load sharing in operationSCCN interfaces in Sydney
Sydney, NSW — Seattle, WA
Sydney, NSW — Manoa, HA — Seattle, WA
MPLS load sharing in operation
Graphs are similar in shape but not in detail● Load sharing is by hashing
– As round robining would delivery every second packet out-of-order
● Sydney-Manoa traffic is not load-shared– As the southern path is Sydney-NZ-Fiji-
Seattle-Manoa
MPLS traffic engineering in operation – Manoa
Typically file transfer trafficLabel switched path Sydney, NSW — Manoa, HW
MPLS traffic engineering in operation – Manoa
No load sharing Sydney-Manoa● South path only used if much more direct
North path fails
TopicAARNet's experiences
ConfigurationProtection
MeasurementProperties of international links
Future
Failure patterns
Many small single-segment failures● These are usually intentional
● Software upgrades● Maintain consistent active service age of equipment
● “Hits” of 50ms of less● Better to blackhole this traffic rather than attempt a
protection switch● When should we declare a path failure?● A big time avoids MPLS fast reroutes at the cost of
greater time to restore service upon a genuine segment failure
Failure patterns
Causes of major failures– Physical break of cable
● SCCN had a cable break whilst maintaining the Protect segment
– Craft technician error● Decommission or loopback of wrong link
Failures are often made worse– Loopback test both segments simultaneously– Insufficient CPU provisioning in control plane– Network Management System fails when most
needed
MPLS fast re-route tuning
Configuration● Need to calculate value for MPLS fast re-
route hold-down timer from capacity vendor's SDH automatic protection switching tables– You'll get lots of small hits otherwise
International link interior routing design
International links are an obvious OSPF stub area● With OSPF default pointing back towards
NOC– BGP default might point towards a US ISP– Exterior default overrides interior default
during normal operation● A stub area is good as we want to isolate
MPLS-TE information for an international link
H.323 configuration
H.323 gatekeeper should always reject calls to PoP console server modems● Forcing calls to re-route to PSTN without
needing a prefix● Uni phone books never list that uni's
prefix to defeat VoIP toll bypass
Personal relationships are important
SCCN has been forthright and honest about failures● Helps considerably to estimate risk of
outage re-occuring● US ISPs compare poorly
AARNet's unusual requirements interested the SCCN technical staff
Allowed us to build excellent relationships which have carried over into operation
TopicAARNet's experiences
ConfigurationProtection
MeasurementProperties of international links
Future
Future intentions
Obviously fuzzy
Lots of trans-Pacific capacity coming available● SCCN (AU-US)● AU-JP● US-JP
Opportunity to construct protection against design and operational failure by international capacity providers
MPLS across undersea vendors
MPLS should be able to do multi-vendor protection better than SDH● SDH has no segment visibility in this
application● No clocking issues● No “profile” issues
MPLS VPNs look like a good idea
MPLS VPNs look attractive for virtual research networks● For example, to research routing protocols● When MPLS becomes a campus
technology then allows network policies smaller than an autonomous system– Not necessarily a good thing– Moves the complexity (not removes the
complexity)– At least the complexity is no longer seen in
global BGP routing table
MPLS VPNs look like a bad idea
VPNs are useful for “crunchy outside, soft inside” firewalled networks● Do not need a firewall at each site● Firewall configuration is simpler
Assumes that the “baddies” are on the outside, ROFL
MPLS configures as best effort offers no protection from denial of service attacks in network “interior”
Use of MPLS to simplify BGP
A&R networks often want to offer transit to other A&R networs
Problem: BGP configuration for this can be complex, and transit network gets caught up in this complexity
Solution: offer MPLS transit
Use of MPLS to simplify BGP
Often use to policy routing because we want multiple routing instances in one router● Operational nightmare, especially in
protection scenarios
Can use MPLS to implement this● Run two BGP instances, obe per VPN● Place interfaces in particular MPLS VPNs
MPLS monitoring
Routers don't provide nearly enough performance information● Protection
● How long did protection take?● What was the cost in CPU resource?
⋯ Enabling capacity planning for protection● End-to-end performance
– Loss– Latency, especially changes
Forwarding equivalence class and quality of service
A forwarding equivalence class is mainly about routing
Quality of service is mainly about queuing
Two choices
Place differing QoS into differing FECs● Label switch router uses label to infer
forwarding and queuing
Place differing QoS into same FEC● Use Experimental bits to mark 3 bits of
service– Treating Exp similarly to IP's DSCP
Not really a choice, we can do both
It might router configuration easier if queueing discipline were always driven from Exp bits
Even if Exp always has the same value for that forwarding equivalence class
Traffic engineering and QoS routing
Traffic engineering can be used to recover some quality services before others● Recover voice services before recovering
best-effort data before recovering video
See “reservation priority”
Experience with QoS so far
No harder than IP DiffServ :-)● Same lack of coherent, total solution● Same “will be supported in next version”
issues
MPLS on Linux
mpls-linux on sourceforge● Kernel patches against 2.4
– Compiles against 2.4.18-rc3-ac1 after about an hour's work
● Includes Nortel-based LDP, library and command line configuration. Supports tunnels and MPLS opcode programming. Over ATM, ethernet and (with patches) PPP.
● Beta
MPLS on Linux
Zebra ospfd● OSPF-TE with Opaque LSA● CVS is usually more stable than releases● Late beta
MPLS on Linux
NIST Switch● For BSD● Reasonably complete MPLS, RSVP-TE
implementation● Web site suggested a Linux port could
happen, but this was in 2001● Current status?
GMPLS expands MPLS's aims
World domination● One control plane protocol
– RSVP-TE
Controlling all switching mechanisms● MPLS
– ATM– Ethernet– RPR
● SDH/SONET
GMPLS expands MPLS's aims, cond
By viewing all switching as a special case of MPLS switching we can get a single● Control layer
– Not one per switching mechanism● Management domain
– Not one per vendor per switching mechanism● Security mechanism
– Not a billion passwords, all known and unalterable
Further readingBooks
Davie & RekhterMPLS: Technology and applications● Good but dated
AlwaynAdvanced MPLS design and implementation● Good coverage of TE and VPNs● “Advanced” only in sense of not a “... for
dummies” book
Further readingInternet drafts
Sharma, Hellstrand (eds)Framework for MPLS-based recovery● draft-ietf-mpls-recovery-frmwrk-05
Lai, McDysan (eds), Boyle, et alNetwork hierarchy and multilayer survivability● draft-ietf-tewg-restore-hierarchy-00