L2 MPLS VPN (VPLS) Technology White Paper - H3Cmpls v… · S9500 L2 MPLS VPN (VPLS) Technology...
Transcript of L2 MPLS VPN (VPLS) Technology White Paper - H3Cmpls v… · S9500 L2 MPLS VPN (VPLS) Technology...
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 1/27
L2 MPLS VPN (VPLS) Technology White Paper
Keywords: MPLS, VPLS
Abstract: MPLS technologies make it very easy to provide VPN services based on IP
technologies and MPLS VPNs are highly scalable and easy-to-manage. There are two
MPLS-based VPN services: L3 MPLS VPN and L2 MPLS VPN. L2 MPLS VPN further
includes VPLS and VLL. VLL applies to point-to-point networking scenarios, while VPLS
supports point-to-multipoint and multipoint-to-multipoint networking. From users’ point of
view, the whole MPLS network is a Layer 2 switched network, through which Layer 2
connections can be established between sites. This document describes VPLS.
Acronyms:
Acronym Full spelling
MPLS Multiprotocol Label Switching
VPLS Virtual Private LAN Service
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 2/27
Table of Contents
1 Overview................................................................................................................................... 3
2 Basic Networking Architecture ................................................................................................... 4
3 Features.................................................................................................................................... 6
3.1 Terminologies.................................................................................................................. 6
3.2 Protocol Processing Mechanism ..................................................................................... 7
3.2.1 Basic Transmission Components of VPLS............................................................. 8
3.2.2 MAC address Learning and Flooding .................................................................... 9
3.2.3 VPLS Loop Avoidance ........................................................................................ 11
3.2.4 Peer PE Discovery and PW Signaling Protocol.................................................... 11
3.2.5 H-VPLS Implementation Mode ............................................................................ 12
3.3 Packet Frame Structure................................................................................................. 15
3.3.1 Packet Encapsulation on the AC ......................................................................... 15
3.3.2 Packet Encapsulation on the PW ........................................................................ 16
3.3.3 VPLS Packet and Encapsulation......................................................................... 17
3.4 Processing of User Data in the Entire Network .............................................................. 19
3.4.1 Processing of Common L2 and L3 User Data in the Entire Network..................... 19
3.4.2 Processing of User Protocol Packet Data in the Entire Network ........................... 19
4 Networking Overview............................................................................................................... 20
4.1 Key Points of VPLS Networking..................................................................................... 20
4.2 Typical VPLS Networking Example................................................................................ 21
5 Features of H3C S9500........................................................................................................... 21
5.1 Features of H3C S9500 VPLS....................................................................................... 22
5.1.1 Complete H-VPLS Solution ................................................................................. 22
5.1.2 Feature Configuration and Processing of VPLS Instances ................................... 22
5.1.3 H-VPLS AC Backup in the ME Mode................................................................... 23
5.1.4 Load Balance and Service Backup...................................................................... 23
5.1.5 Binding of Multiple VLANs with a Single VPLS Instance ...................................... 24
5.2 Processing Flow of H3C S9500 VPLS........................................................................... 25
5.3 VPLS-Relevant Features of H3C S9500........................................................................ 26
6 References.............................................................................................................................. 26
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 3/27
1 Overview As the social development moves on, the economic globalization trend becomes more
obvious, more and more enterprises are more widely distributed and the mobility of
company employees is increasing. This urges the telecom operators to provide link
connections so that the branches of an enterprise can be incorporated to construct
their own Intranet and their employees can easily access their Intranet outside the
enterprise.
At the very beginning, the telecom operators provided the enterprises with links via the
leased lines. The major disadvantages of this method are that it does not suite the
multi-branch and quick growth feature of the current enterprises and is also
characteristic of high cost and difficulty in management. Then as the rise of ATM and
frame relay technologies, the telecom operators turned to provide customers with
point-to-point L2 connections via virtual circuits, based on which the customers
constructed their own L3 networks to bear IP, IPX and other data streams. These
technologies all provided point-to-point L2 connections and the configuration was
complex. In particular, when a site was added, the administrator needed to make a lot
of configurations.
Today, IP networks are distributed all over the world and how to utilize the existing IP
networks to provide low-cost private networks for enterprises gradually becomes the
focus of the operators. Therefore, a kind of technology to provide VPN services on the
IP network and freely set any rate with simple configurations emerges. This
technology is the MPLS VPN. There are two MPLS-based VPN services: L3 MPLS
VPN and L2 MPLS VPN. L2 MPLS VPN further includes VPLS and VLL. VLL only
applies to the point-to-point networking application while VPLS can implement
multipoint-to-multipoint VPN networking, provides a more complete solution for the
operators who use point-to-point L2VPN service and can avoid the intervention in the
user’s internal route hierarchy as in the L3VPN. In this way, the operator may only
need to manage and operate a single network and provide multiple services (e.g.
best-effort IP service, L3 VPN, L2 VPN, traffic engineering, differentiated services, etc.)
on this network. This reduces the operator’s plenty of costs in construction,
maintenance and operation.
The VPLS service enables the users that are geographically isolated to connect with
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 4/27
one another via the MAN/WAN and enables the sites to be connected as if they were
connected in a LAN. A series of drafts[1] of IETF have described the VPLS solution
using the PWs (Pseudowires) of the MPLS as the Ethernet links and providing a
transparent transmission LAN service (TLS) via the MPLS network.
2 Basic Networking Architecture In the draft [2] related to VPLS, two VPLS network architectures are proposed: The
VPLS network with the fully-meshed logic connections of PWs (Pseudowires) and the
hierarchical VPLS architecture, as shown in Figure 1 and Figure 2 .
Figure 1 Common VPLS network architecture
As shown Figure 1 , the PEs in various sites of the VPLS network are logically fully
meshed. The VPLS network can provide point-to-multipoint connection service like
L3VPN and the PEs can learn MAC addresses and switch packets between multiple
points. The MPLS network provides tunnels for transparent transmission of VPN
packets, and the P equipment in the network does not involve in the learning and
switching of MAC addresses but only forwards the MPLS packets. Moreover, the
forwarding tables between VPNs on the PEs are independently of one another and the
MAC addresses can overlap between VPNs.
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 5/27
Figure 2 Hierarchical VPLS network architecture
As shown Figure 2 , in the hierarchical VPLS network architecture, the logic
fully-meshed connections are implemented in the core equipment (NPE) while the
user PE (UPE) is only connected with the nearest NPE via a PW to exchange packets
with the peer site. In this way, the network topology is hierarchical and the access
range is expanded. In the core network, the NPE has good performance/functions with
centralized VPN service flows while the UPE has low performance/function
requirement and is used for service access of VPNs. Meanwhile, link backup can be
implemented between the edge access equipment and the NPE and this enhances the
network robustness. The access network between the UPE and the NPE can be the
MPLS edge network (connected via VPLS or VLL) or a simple Ethernet (connected via
QinQ).In addition, the access mode of each UPE in the hierarchical VPLS network
architecture may be in fixed use and the access types from the UPE to the NPE can
be freely selected for the sites of the VPNs according to the actual access network
conditions.
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 6/27
3 Features
3.1 Terminologies
l MPLS L2VPN: MPLS L2VPN transparently transmits the L2 data on the MPLS
network. In the point of view of the user, this MPLS network is an L2 switching
network via which the L2 connections can be set up among different sites.
There are two types of MPLS L2VPN: VLL and VPLS.
l VPLS (Virtual Private LAN Service): It is a point-to-multipoint L2VPN service
provided in the public network. The VPLS service enables the users that are
geographically isolated to connect with one another via the MAN/WAN and
enables the sites to be connected as if they were connected in a LAN.
l VLL (Virtual Leased Line): It is a point-to-point L2 VPN service provided in the
public network. It enables two sites to be connected as if they were connected
via a line. It cannot provide switching among multiple points of the service
provider.
l CE (Custom Edge): It is the user edge equipment directly connected to the
service provider.
l PE (Provider Edge Router): It refers to the edge router in the backbone network
and is connected to the CE for the access of VPN services. It completes the
mapping and forwarding of packets from the private network to public network
tunnels and from public network tunnels to the private network. PEs can be
further divided into the UPE and the NPE.
l UPE (User facing-Provider Edge): It is the PE equipment close to the user side
and serves as the convergence equipment for users to access the VPN.
l NPE (Network Provider Edge): It is the core PE of the network and is located at
the edge of the core domain of the VPLS network to provide the VPLS
transparent transmission service between the core networks.
l VSI (Virtual Switch Instance): Through the VSI, the actual access links of the
VPLS can be mapped to various PWs.
l PW (Pseudo Wire): It is a bidirectional virtual connection between two VSIs
and is composed of a pair of unidirectional MPLS VCs.
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 7/27
l AC (Attachment Circuit): It refers to the connection between the CE and PE. It
may be the actual physical interface or a virtual interface. All the user packets
on the AC should generally be transparently transmitted to the peer site,
including the L2/L3 protocol packets of the users.
l QinQ (802.1Q in 802.1Q): It is a mechanism directly using the 802.1q-based
tunneling protocol of the Ethernet switch to provide multipoint L2VPN services.
It encapsulates the private network VLAN tag of the user into the public
network VLAN tag and the packet carries both layers of tags while crossing the
backbone network of the provider, thus providing the user with a simpler L2
VPN tunnel.
l Forwarders: It is a kind of PE. A PE receives the data frames sent over the AC
while a forwarder selects the PW for forwarding the packets. A forwarder is in
fact the forwarding table of VPLS.
l Tunnels: Used for bearing PWs. One tunnel can bear multiple PWs, generally
MPLS tunnels. A tunnel is a direct-connect channel between a local PE and the
peer PE to transparently transmit data between the two PEs.
l Encapsulation: The packet transmission over the PW uses the standard PW
encapsulation format and technology. There are two modes for VPLS packet
encapsulation over the PW: Tagged mode and Raw mode.
l PW Signaling: The PW signaling protocol is the basis for VPLS implementation,
used for establishing and maintaining PWs. It can also be used for
automatically discovering the peer PE of a VSI. At present, there are two PW
signaling protocols: LDP and BGP.
l Service Quality: To map the priority information in the L2 packet header of the
user into the QoS priority for transmission over the public network, generally
the MPLS network that supports traffic engineering should be applied.
3.2 Protocol Processing Mechanism
The VPLS-related draft describes the basic transmission components of the VPLS
network. All the VPLS services are completed by the series of transmission
components. The VPLS solution provided by the draft also centers on the formation
and application of these basic transmission components. In addition, the draft provides
the hierarchical VPLS application solution with non-fully-meshed connections of PWs.
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 8/27
3.2.1 Basic Transmission Components of VPLS
The whole VPLS network is just like a huge switch. It establishes PWs between sites
of various VPNs via MPLS tunnels and transparently transmits L2 packets of users via
these PWs. The PEs will learn the source MAC address and establish an MAC
forwarding table entry while forwarding a packet, so as to complete the mapping
between MAC addresses and user Attachment Circuits (ACs)/PWs. The P equipment
only needs to complete the MPLS data forwarding according to the MPLS label
without concerning the L2 user packets internally encapsulated in the MPLS packets.
The transmission components of the VPLS network and their functions are described
as follows:
l Attachment Circuit (AC): A connection line or virtual link between the CE and
the PE. All the user packets on the AC should generally be transparently
transmitted to the peer site, including the L2/L3 protocol packets of the users.
l Pseudo wire (PW): A bidirectional virtual connection established between two
VSIs of one VPN. It is composed of a pair of unidirectional MPLS VCs, borne
over the LSP and established via the PW signaling protocol. For the VPLS
system, the PW is just like a direct-connect channel from one local AC to the
peer AC to transparently transmit L2 data of users.
l Forwarders: A PE receives the data frames sent over the AC while a forwarder
selects the PW for forwarding the packets. A forwarder is in fact the forwarding
table of VPLS.
l Tunnels: Used for bearing PWs. One tunnel can bear multiple PWs, generally
MPLS tunnels. A tunnel is a direct-connect channel between a local PE and the
peer PE to transparently transmit data between the two PEs.
l Encapsulation: The packet transmission over the PW uses the standard PW
encapsulation format and technology. There are two modes for VPLS packet
encapsulation over the PW: Tagged mode and Raw mode.
l Pseudo wire Signaling: The PW signaling protocol is the basis for VPLS
implementation, used for establishing and maintaining PWs. It can also be
used for automatically discovering the peer PE of a VSI. At present, there are
two PW signaling protocols: LDP and BGP.
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 9/27
l Service Quality: To map the priority information in the L2 packet header of the
user into the QoS priority for transmission over the public network, generally
the MPLS network that supports traffic engineering should be applied.
The positions of the basic transmission components of VPLS in the network are shown
in Figure 3 :
Figure 3 Basic transmission components of VPLS
Let’s take the packet flow of VPN1 from CE3 to CE1 for example to describe the basic
data flow direction. CE1 sends an L2 packet and the packet enters PE1 via the AC.
After PE1 receives the packet, the forwarder selects a PW for forwarding the packet
and the system then generates L2 MPLS labels according to the forwarding table entry
of the PW (the private network label is used to identify the PW while the public network
label is used to cross the tunnel and reach PE1). After the packet reaches PE2
through the public network tunnel, the system pops up the private network label (the
public network label pops up via PHP on the P equipment). The forwarder of PE2
selects an AC for forwarding the L2 packet from CE3 to CE1.
3.2.2 MAC address Learning and Flooding
The control plane of VPLS does not need to advertise and distribute reachability
information, but it lets the address learning of the standard bridge function in the data
plane to provide reachability.
(1) Source MAC address learning
The MAC address learning process involves two parts:
l Remote MAC address learning associated with PW
Because a PW is composed of a pair of unidirectional VC LSPs (the PW will be
regarded as being up only when the VC LSPs in both directions are up), the PW
should map the MAC address to the VC LSP in the egress direction when the VC LSP
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 10/27
in the ingress direction has learnt an MAC address originally unknown to it.
l Local MAC address learning of the port directly connected to the user
For an L2 packet sent from the CE, the source MAC address in the packet should be
learnt by the corresponding port on the VSI.
The address learning and flooding process of the PE is illustrated in Figure 4 .
Figure 4 Address learning and flooding process of the PE
(2) MAC address reclamation
The MAC addresses dynamically learnt must have the refresh and re-learning
mechanisms. In the draft[2] related to VPLS, a kind of address reclamation message
that uses the optional MAC TLV for deleting or re-learning the specified MAC address
list is provided.
When the topology structure changes, the address reclamation message can be used
so as to quickly remove the MAC addresses. The address message falls into two
types: The address message with the MAC address list and the address message
without the MAC address list. If a backup link (becoming active) receives a notification
message with the re-learnt MAC address list, the PE will update the MAC address
entry in the FIB table of VPLS instances and send this message to the other relevant
LDP sessions to directly connect the PE. If the notification message contains a null
MAC address TLV list, it indicates that the PE should remove all the MAC addresses
in the specified VPLS instance (except the MAC addresses learnt from the PE that
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 11/27
sends this message).
(3) MAC address aging
The remote MAC addresses learnt by the PE require an aging mechanism to remove
the table entries related to the VC label but no longer in use. After the packet is
received, the aging timer corresponding to the source address shall be reset.
3.2.3 VPLS Loop Avoidance
To avoid loop occurrence, the STP protocol should be enabled in a common L2
network but the private network STP evidently should not participate in the network of
the ISP. In VPLS, fully-meshed connections and split horizon forwarding are used to
avoid the running of STP on the ISP network. Each PE must create a tree for each
VPLS forwarding instance to all the other PE routers in this instance. Each PE router
must support the split horizon policy to avoid loop occurrence, that is, the PE router
cannot forward packets between the PWs of the same VPLS instance (because all
PEs are directly connected in the same VPLS instance). In this sense, split horizon
forwarding means that the data packets received from the PWs on the public network
side will no longer be forwarded to the other PWs but can only be forwarded to the
private network side.
In the point of view of the user, it is allowed to run the STP at the L2VPN private
network side and all the BPDU packets of the STP are only transparently transmitted
in the network of the ISP.
3.2.4 Peer PE Discovery and PW Signaling Protocol
For the PEs in the same VSI, their addresses can be manually specified in a remote
manner or automatically discovered by the other auto discover mechanisms. At
present, the peer PE of a VSI can be automatically discovered via BGP or LDP and
these two protocols can also be used as the PW signaling protocol to establish PWs.
The establishment of a PW is to allocate a multiplex detachment label (VC label) and
advertise the allocated VC label to the peer PE. In addition to label distribution, the PW
signaling protocol is also used to advertise the parameters relevant to the VPLS
system, for example, PW ID, control word, interface parameters and so on. Through
the PW signaling protocol, a fully-meshed PW can be established between PEs to
serve the VPLS.
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 12/27
3.2.5 H-VPLS Implementation Mode
Because the VPLS solution described above requires that a fully-meshed tunnel LSP
should be established between all the PE routers providing the VPLS service,
n*(n-1)/2 PWs should be established between the PEs for each VPLS service.
However, these PWs are all generated by the signaling protocol. The real
disadvantage is that the above solution cannot achieve large-scale application,
because the PE routers providing VCs need to duplicate data packets and each PE
needs to broadcast the packet to all the peer equipment in the case of the first packet,
a broadcast packet or a multicast packet. Through the hierarchical connections, we
can reduce the load of the signaling protocol and data packet duplication(Although the
total number of broadcast packets duplicated remains unchanged, it is completed
together by multiple devices in H-VPLS), so that the VPLS can be applied on a large
scale.
Generally, the LSP will place some small edge devices in the user inhabit and
aggregate them into a PE in the central office. Therefore, it is quite necessary to
extend the tunneling technology of VPLS service to the MTU (Multi-Tenant Unit). In
this way, the MTU equipment can be regarded as a PE and used to provide the basic
VPLS virtual connection service at each edge. The feasible technologies include the
use of a PW and the Q-in-Q logical interface between the MTU and the PE. In the
two-layer hierarchical VPLS, one layer is the core PW (hub) of VPLS and the other is
the extended access PW (spoke).
(1) Two access means of H-VPLS
The two access means of H-VPLS are illustrated in the following figures:
Figure 5 LSP access mode of H-VPLS
As shown in Figure 5 , the UPE works as the aggregation equipment MTU and it only
establishes a PW with NPE1 to connect the link U-PW and does not establish any PW
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 13/27
with the rest peers. The data are forwarded in the following procedure: UPE1 sends
the packet from a CE to NPE1 and adds the multiplex detachment label (MPLS label)
of the U-PW to the packet. Upon receipt of the packet, NPE1 determines the VSI of the
packet according to the multiplex detachment label and then adds the multiplex
detachment label of the N-PW to the packet according to the destination MAC address
of the user data packet before forwarding the packet. After receiving the packet from
the N-PW, NPE1 adds the multiplex detachment label of the U-PW and sends the
packet to the UPE. Upon receipt of the packet, the UPE then forwards it to the CE.
If CE1 and CE2 exchange data for the local CEs, the UPE will directly forward the
packets between CE1 and CE2 without needing to report the packet to NPE1 because
of its bridge function. However, if it is the first packet or a broadcast packet whose
destination MAC address is unknown, the UPE will still forward the packet via the
U-PW to NPE1 while broadcasting the packet via the bridge to CE2, so that NPE1 can
duplicate the packet and forward it to each peer CE.
Figure 6 QinQ access mode of H-VPLS
As shown in Figure 6 , the MTU is a standard bridge device. QinQ is enabled on the
CE access ports and the VLAN-TAG is attached as the multiplex detachment label.
The packet is transparently transmitted to PE1 via the QinQ tunnel between the MTU
and PE1. PE1 then determines the VSI according to the VLAN-TAG attached by the
MTU and then adds the multiplex detachment label of the PW (MPLS label) to the
packet according to the destination MAC address of the user data packet before
forwarding the packet. After receiving the packet from the PW, PE1 determines the
VSI of the packet according to the multiplex detachment label (MPLS label) and then
adds the VLAN-TAG according to the destination MAC address of the user data
packet for the QinQ tunnel to forward the packet to the MTU, which will then forward
the packet to the CE.
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 14/27
If CE1 and CE2 exchange data for the local CEs, the MTU will directly forward the
packets between CE1 and CE2 without needing to report the packet to PE1 because
of its bridge function. However, if it is the first packet or a broadcast packet whose
destination MAC address is unknown, the MTU will still forward the packet via the
QinQ tunnel to PE1 while broadcasting the packet via the bridge to CE2, so that PE1
can duplicate the packet and forward it to each peer CE.
(2) Backup of the H-VPLS AC
Since there is only a single connection link between the MTU/UPE and the PE/NPE,
this solution has an obvious disadvantage: Once the AC fails, all the VPNs connected
to the aggregation equipment will be disconnected. Therefore, for the two access
models of H-VPLS, we need to design the backup link: In normal cases, the equipment
only uses one link (master) for access purposes; however, once the VPLS system
detects that the master link fails, it will start the backup link to continue to provide VPN
services.
For the H-VPLS using LSP access, because the LDP session is run between the UPE
and the NPE, the activity status of the LDP session can be used to judge the failure or
not of the master PW. For the H-VPLS using QinQ access, the STP should be run
between the MTU and the PE connected to the MTU, so as to ensure that the other
link will be started once the master link fails.
As shown in Figure 7 , the UPE detects that the U-PW with NPE1 fails, so it
automatically starts the backup PW to transmit data. Suppose there is a packet whose
MAC address is “A” in CE1, it initially reaches CE3 via the master PW. Because of the
MAC address learning ability of VPLS, the MAC address will be learnt by the
corresponding virtual interfaces on NPE1 and NPE3. Again because NPE3 does not
know the occurrence of link switchover at the peer, it still keeps the MAC address table
entry, which is obvious a mistake. For this reason, the relevant MAC addresses should
be reclaimed when the UPE conducts the active/standby PW switchover. The MAC
address reclamation can be implemented by use of the address reclamation message
of the LDP. If there are many MAC addresses to be reclaimed, then an address
reclamation message whose MAC address list is null can be directly sent, so as to
clear all the MAC addresses in the VPN (except for the address table entry of the link
over which the MAC address reclamation message is sent).
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 15/27
PW3
NPE 1
NPE 2
NPE 3
UPE
CE 2
CE 1PW1
PW2
CE 3
U-PW
U-PW(backup)
MAC A
MAC A,UPWMAC A,PW1
MAC A,reclamation
Figure 7 MAC address update after the active/standby PW switchover
The MAC address reclamation message is sent and processed as follows: The UPE
sends the MAC address reclamation message to NPE2. After processing this
message, NPE2 learns the address “MAC A” and tells it to the backup PW before
sending the message to the other peers (NPE1 and NPE3). The other peers process
the received message, learn “MAC A” and tell it to the corresponding PWs.
(3) Multi-domain VPLS service
The hierarchical VPLS can also be used to create the VPLS service of a larger scale
and spare the need of full-meshed connections for all the VPLS equipment in the case
of a single VPLS domain or crossing multiple domains. Each fully-meshed VPLS
network is connected via a single LSP tunnel and each VPLS network uses one PW to
connect two domains. When more than two domains are connected, the inter-domain
fully-meshed PW must be established on each edge PE. In this way, a three-layer
model is created: The direct connections between the MTU and the PEs; the
fully-meshed connections between PEs in the domain; and the fully-meshed
connections between the inter-domain edge PEs.
3.3 Packet Frame Structure
3.3.1 Packet Encapsulation on the AC
The packet encapsulation mode on the AC is decided by the user access mode. There
are two user access modes: VLAN access and Ethernet access. They are defined as
follows:
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 16/27
l VLAN access: The header of the Ethernet frame sent upward from the CE or
sent downward from the PE carries a VLAN Tag, which is a service delimiter
designed by the ISP to differentiate users. We call this TAG the “P-Tag”.
l Ethernet access: The header of the Ethernet frame sent upward from the CE or
sent downward from the PE does not carry any service delimiter. If the frame
header contains a VLAN Tag, it indicates that the tag is only the internal VLAN
Tag of the user packet and is insignificant to the PE equipment. We cal this kind
of user’s internal VLAN Tag the “U-Tag”.
The VSI access means of the user can be specified through configuration.
3.3.2 Packet Encapsulation on the PW
There are also two packet encapsulation modes on the PW: Raw mode and Tagged
mode.
In the Raw mode, the P-Tag is not transmitted on the PW: For the uplink packet on the
CE side, if a packet with a service delimiter is received, the service delimiter will be
removed first before the packet is sent upward, attached with two layers of MPLS
labels and then forwarded; or if a packet without service delimiter is received, the
packet will be directly sent upward and then attached with two layers of MPLS labels
before being forwarded. For the downlink packet on the PE side, the packet will be
added or not added (depending on the specific configurations) with a service delimiter
before being forwarded to the CE but it is not allowed to rewrite or remove any existing
tag.
In the Tagged mode, the P-Tag must be carried on the frame transmitted on the PW:
For the uplink packet on the CE side, if a packet with a service delimiter is received,
the service delimiter will not be removed but the packet will be directly sent and then
attached with two layers of MPLS labels before it is forwarded; or if a packet without
service delimiter is received, the packet will be added with a null tag first before it is
sent upward, attached with two layers of MPLS labels and then forwarded. For the
downlink packet on the PE side, the service delimiter will be rewritten, removed or
retained (depending on the specific configurations) before the packet is forwarded to
the CE.
The protocol[2] stipulates that the tagged mode applies by default.
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 17/27
3.3.3 VPLS Packet and Encapsulation
As shown in Figure 8 , Figure 9 , Figure 10 , Figure 11 , the green arrows show the
encapsulation of the user packets not carrying a private network VLAN tag among the
devices playing different VPLS roles, while the purple arrows show the encapsulation
of the user packets carrying a private network VLAN tag among the devices playing
different VPLS roles. In addition, the encapsulation format between the PEs (on the
PWs) shown in the figure is given without considering the outer-layer tunnel label PHP
operation. If the operation is taken into account, then the packet encapsulation on the
PWs may be a single-layer MPLS label (inner-layer label).
Figure 8 Link packet encapsulation in the Raw mode via Ethernet access
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 18/27
Figure 9 Link packet encapsulation in the Tagged mode via Ethernet access
Figure 10 Link packet encapsulation in the Raw mode via VLAN access
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 19/27
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAG
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAG
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAGMAC label1 label2
MAC label1 label2
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAG
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAG
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAGMAC label1 label2
MAC label1 label2
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAG
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAG
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAGMAC label1 label2
MAC label1 label2
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAG
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAG
U-TAG IP header DataMAC
IP header DataMAC
P-TAG
P-TAGMAC label1 label2
MAC label1 label2
PE1
CE1
PE2
CE2
Figure 11 Link packet encapsulation in the Tagged mode via VLAN access
3.4 Processing of User Data in the Entire Network
3.4.1 Processing of Common L2 and L3 User Data in the Entire Network
According to the characteristics of VPLS services, the common L2 and L3 user data
will be transparently transmitted to the peer end, including the MAC header of the user
packet and the private VLAN tag of the user.
For the unicast packet with a known MAC address from the PE, the system will
transparently transmit the packet to the corresponding CE.
For an unknown unicast, multicast or broadcast packet of the user, the system will
broadcast it in the entire VPLS domain, that is, all the CEs will receive the packet.
For an L3 packet of the user, the VPLS system will forward it based on the L2 header
of the packet without caring about the content of the L3 packet.
3.4.2 Processing of User Protocol Packet Data in the Entire Network
Since the intermediate P equipment forwards packets only based on the outer-layer
MPLS label without caring about whether the packet is a common packet or a protocol
packet, all the L2 and L3 protocol packets of the user will be transparently transmitted
by the VPLS system. The protocol packet of the private network will not interact with
the protocol of the VPLS system. They are independent from each other. The private
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 20/27
network protocol data will not affect the public network protocol.
For the user protocol packet whose destination MAC address is a unicast MAC
address, the system will transparently transmit the packet to the corresponding CE.
For the user protocol packet whose destination MAC address is a multicast or
broadcast packet, the system will broadcast it in the entire VPLS domain and all the
CEs will receive the protocol packet.
4 Networking Overview
4.1 Key Points of VPLS Networking
(1) Logic fully-meshed connections among the PEs
l Fully-meshed PWs must be set up among all the PEs for the VPLS basic
networking.
l Fully-meshed PWs must be set up among the NPEs for the H-VPLS
networking.
(2) Correct configuration of user access modes and access ports
l The access modes of all the VPLS instances at the access port must be
consistent.
l In the VLAN access mode, the uplink packets of the user must carry P-TAG, the
access port should be configured as “Trunk”, and the corresponding VLAN of
the connected VPN shall be allowed to pass.
l In the Ethernet access mode, the uplink packets of the user cannot carry
P-TAG but can carry the user private network tag, the access port should be
configured as “Access”, and the QinQ function of the port should be enabled.
(3) Correct UPE configuration in the H-VPLS networking
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 21/27
l The roles of UPE and NPE must be made clear. Incorrect configurations will
cause loops in the VPLS domain.
l The UPE is allowed to access one NPE only. When there are active and
standby links, it can access two NPEs.
l The NPE can access multiple UPEs.
l When the UPE accesses the NPE via QinQ, the access mode of the
corresponding instances on the NPE should be VLAN access. When there is
link backup, the STP needs to be enabled between the UPE and two NPEs to
backup the links.
l When the UPE accesses the NPE via LSP, the UPE can access the NPE in the
VLL or VPLS mode, and it should be specified at the NPE that the access
equipment be UPE. If there are the active and the standby PWs, it is necessary
to specify the active/standby relation of the NPEs.
l The role definitions of UPE and NPE are only within a certain VPLS instance.
4.2 Typical VPLS Networking Example
See Figure 12 :
Figure 12 Network diagram for typical VPLS networking
5 Features of H3C S9500 S9500 series switches support the basic VPLS networking and H-VPLS networking.
For the H-VPLS networking, they support multiple access modes: QinQ, VPLS and
VLL. The VPLS service feature board is adopted in the S9500 for centralized
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 22/27
processing of VPLS services. The S9500 supports comprehensive feature
management of VPLS instances and provides a good VPLS solution.
5.1 Features of H3C S9500 VPLS
5.1.1 Complete H-VPLS Solution
The S9500 fully supports the H-VPLS solution proposed in the draft. Its VPLS service
access network can be a common Ethernet or an MPLS edge network. In addition, the
access networks where multiple sites of a VPLS instance are located are independent
from each other. The local Ethernet access network can interwork with the peer MPLS
edge access network.
5.1.2 Feature Configuration and Processing of VPLS Instances
To facilitate VSI maintenance and management, a series of features can be supported
for each VPN, such as VSI traffic limit, broadcast traffic limit, MAC address quantity
limit and QoS class.
VSI traffic limit refers to the maximum traffic that the VPN can access on the PE. Once
the user traffic exceeds this limit, the user packets will be discarded.
To limit the L2 broadcast packets of the VPN, the user is allowed to specify the
broadcast traffic limit. VSI broadcast traffic limit refers to the percentage of the
broadcast traffic in the VPN to the maximum VPN traffic on the PE. Once the
broadcast traffic exceeds the broadcast suppression percentage of the VSI traffic limit,
the user’s broadcast packets will be discarded.
Since the forwarding table entry resources of the system are limited, it is necessary to
limit the MAC address quantity of VSI. Once the number of hosts in each VSI exceeds
the MAC address quantity limit of the VSI, the VPLS system will no longer learn MAC
forwarding table entries. The operator can configure a proper MAC address limit value
for the VPN users, so as to ensure that they can run internal private network services
normally. In some exceptional cases (such as the user equipment is infected by
viruses or there is MAC address attack), the system resources can be prevented from
being used up by the VSI.
When forwarding user packets, the VPLS system can classify the packets according
their priorities to ensure that some important data are preferably forwarded to the peer
in the public network. The system will map the priority information in the L2 packet
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 23/27
header of the user into the QoS priority for transmission over the public network
(mapping it into a tunnel transmission priority). Generally the MPLS network that
supports traffic engineering should be applied. There is a table of mapping from IEEE
802.1Q COS to tunnel EXP in the relevant protocol.
5.1.3 H-VPLS AC Backup in the ME Mode
The S9500 supports the backup of the AC from the MPLS edge network to the
H-VPLS. Since the UPE serves as a convergence device, all the VPLS services that
access via the UPE will be affected once the link between the UPE and the NPE is
faulty. To enhance the stability, the AC backup function can be enabled.
5.1.4 Load Balance and Service Backup
The S9500 supports load balance between multiple VPLS service boards, improving
the VPLS forwarding performance of the system. The S9500 also supports service
backup. When a service board fails, the S9500 automatically switches the traffic of the
board to a board working normally. This improves the reliability of VPLS services.
Currently, VPLS supports eight even label ranges: Label 0 to Label 7. By establishing
two tiers of mapping, VPLS redirects the services of VSIs to VPLS boards for
processing:
l Mapping between VSIs and label ranges
When you configure a VSI, the S9500 automatically selects for the VSI the label
range with the greatest number of available labels. You can also use the command
that the S9500 provides to specify a label range form the VSI.
l Mapping between label ranges and VPLS boards
Mapping between label ranges and VPLS boards is implemented by configuring
redirection on public network interfaces. The configuration command has two
important parameters: one specifies the redirection rule, namely the VPLS label
range; the other specifies the VPLS board. The command allows assigning label
ranges to VPLS boards. A single label range cannot be assigned to more than one
VPLS board.
Using the above mechanisms, you can assign VSIs to label ranges evenly, and
assign label ranges to VPLS boards evenly, implementing load balance between
multiple VPLS service boards.
When a board fails or is pulled out, the S9500 immediately selects the VPLS board
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 24/27
which is servicing the least number of VSIs to take over all the VSI services on the
original board. After the original board comes back into service or a normal VPLS is
substituted, the S9500 waits to see whether the board can work normally for a period
of time. If so, it switches the VSI services back to the board.
5.1.5 Binding of Multiple VLANs with a Single VPLS Instance
The S9500 supports binding a single VPLS instance with up to 64 local VLANs,
allowing the local VLANs to communicate with each other and to communicate with
the remote VLANs bound to the same VPLS instance at Layer 2. This expands the
user access scope greatly. Note that the number of VLANs that can be bound with a
VPLS instance may vary. Refer to the relevant specification description.
Figure 13 Bind multiple VLANs with a single VPLS
In the example shown in Figure 13 , VLAN 10 and VLAN 11 are bound with the same
VPLS instance. The following describes the data forwarding process:
When an ARP request of VLAN 10 arrives at the VPLS service board, the board looks
up its ARP table based on the destination IP address. As no match is found, the board
broadcasts the ARP request to all remote user side networks of the same VPLS
instance and to all the other local VLANs bound with the same VPLS instance (VLAN
11 in this example). When the local PE receives the ARP response from VLAN 11, it
forwards it to VLAN 10.
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 25/27
5.2 Processing Flow of H3C S9500 VPLS
The user-side packets access via the common interface board of S9500. The system
will send upward all the data in the private VLAN to which the VPLS service is bound
to the VPLS service board for centralized processing. After learning the MAC address
and searching for the forwarding table entry, the system will add two layers of MPLS
labels to the original user packet and then forward it to the next-hop device of the peer
PE.
In S9500, since the VPLS packets at the PW side cannot be processed by common
interface boards, the user needs to configure redirection rules on the corresponding
port of the public network, so as to direct the packets to the VPLS service board for
processing. At present, S9500 supports the establishment of up to 128K PWs, that is,
the system can allocate 128K private network labels for PW establishment. During the
VPLS service configuration, the user is required to configure a rule to redirect the
MPLS packets within the private network label range corresponding to the PW on the
public network port, so that the VPLS services from the PW side will be directed to the
service board for centralized processing. After learning the MAC address and
searching for the forwarding table entry, the system forwards the packets to the CE.
When the S9500 runs VPLS, it requires a VPLS service board for centralized
processing. Figure 14 illustrates the processing model.
Figure 14 Diagram for VPLS processing on the S9500
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 26/27
5.3 VPLS-Relevant Features of H3C S9500
l Supports both Martini (LDP) and Kompella VPLS. Refer to the specific
specification description.
l Supports up to four VPLS service boards for load balance and service backup.
l The VPLS boards use NP boards for centralized processing and upgrade of
VPLS boards can be implemented by upgrading the software.
l No interface is provided; inputting and outputting of traffic depends on LPUs.
l The VPLS service boards take responsibility for MPLS label encapsulation and
MPLS forwarding, and therefore the LPUs can be standard ones or enhanced
ones and do not necessarily support MPLS.
l Supports MAC address aging and reclamation.
l Supports MPLS network loop avoidance based on full mesh and split horizon.
l Allows private networks to run STP for loop avoidance on private networks and
supports transparent transport of STP protocol messages between private
networks.
l Supports H-VPLS and provides QinQ and LSP access between UPE and NPE.
l Supports VSI traffic bandwidth limiting.
l Supports VSI broadcast traffic limiting.
l Supports VSI MAC address limiting and helps in preventing MAC address
attacks.
l Supports QoS and mapping from CoS priorities to EXP priorities.
For other relevant information, refer to the related documents.
6 References Draft[1]:
Draft: draft-ietf-pwe3-control-protocol-11.txt
Draft[2]:
RFC: RFC 4761 (Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery
and Signaling.txt)
RFC: RFC 4762 (Virtual Private LAN Service (VPLS) Using Label Distribution Protocol
(LDP) Signaling.txt)
S9500 L2 MPLS VPN (VPLS) Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 27/27
Draft: draft-ietf-l2vpn-vpls-ldp-03.txt
Draft: draft-ietf-l2vpn-vpls-bgp-02.txt
Copyright ©2007 Hangzhou H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of Hangzhou
H3C Technologies Co., Ltd.
The information in this document is subject to change without notice.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 1/20
H3C S9500 NAT Technology White Paper
Keywords: NAT
Abstract: Network Address Translation (NAT) provides a way of translating the source IP address
in an IP packet header to another IP address. In practice, NAT is primarily used to allow
users using private IP addresses to access the Internet. With NAT, a few public IP
addresses are used by a larger number of private network hosts to solve the problem of
IP addresses depletion.
Acronyms:
Acronym Full spelling
NAT Network Address Translation
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 2/20
Table of Contents
1 Overview................................................................................................................................... 3
2 Introduction to NAT.................................................................................................................... 3
2.1 Related Terms................................................................................................................. 3
2.2 Operation of NAT ............................................................................................................ 4
2.2.1 Single Instance ..................................................................................................... 4
2.2.2 Multi-Instance ....................................................................................................... 9
3 Application Scenarios .............................................................................................................. 12
3.1 Common POP Network ................................................................................................. 12
3.2 Multi-ISP Network Using Policy-Based Routing ............................................................. 13
3.3 Multi-Instance VPN-Public NAT ..................................................................................... 14
3.4 Multi-Instance VPN-VPN NAT ....................................................................................... 15
4 H3C S9500 Characteristics ..................................................................................................... 16
4.1 Overview....................................................................................................................... 16
4.1.1 Use of Network Processor................................................................................... 16
4.1.2 Large Capacity, High Performance...................................................................... 17
4.1.3 Support for Access to Internal Servers ................................................................ 17
4.1.4 Support for Static Address Translation................................................................. 17
4.1.5 Rich ALG Features.............................................................................................. 17
4.1.6 Blacklist Function................................................................................................ 17
4.1.7 Logging function ................................................................................................. 18
4.1.8 Support for VPN Users........................................................................................ 18
4.1.9 Limit to the Numbers of Users and Connections Within a VPN ............................ 18
4.1.10 NAT for Inter-VPN Communication .................................................................... 19
4.2 NAT Operation Process of the H3C S9500 .................................................................... 19
4.2.1 NAT Single Instance............................................................................................ 19
4.2.2 NAT Multi-Instance.............................................................................................. 19
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 3/20
1 Overview As the Internet is faced with IPv4 address depletion, NAT and IPv6 were introduced
to solve this problem. NAT is used based on the following fact: In private networks
(such as an enterprise network), only a small number of hosts access the Internet at
a specific time, and 80% traffic are limited within the network. Therefore, the hosts
are assigned private IP addresses (The IANA reserves the addresses of network
segments 10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16). A private address does not
need to be globally unique, and it can be used in different private networks and
translated into a public address when the host using it accesses the Internet.
The MPLS L3 VPN technology is widely used, especially in large enterprise networks,
for it inherits the advantage of IP routing and integrates the fast forwarding and
flexible networking characteristics of MPLS. MPLS L3 VPN features network structure
simplification, easy maintenance, stable performance, and secure network access.
Integrating NAT with MPLS L3 VPN can make a private network invisible to the
outside to enhance network security, and help save operating costs by providing
reusable IP addresses. NAT multiple-instance enables perfect integration of NAT and
MPLS L3 VPN by allowing for access to the Internet and between VPNs through NAT,
and address reuse in different VPNs.
2 Introduction to NAT
2.1 Related Terms
l NAT: Provides a way of translating private IP addresses into public IP
addresses, allowing hosts in a private network (or a public network) to access
the public network (or the private network).
l NAPT(Network Address and Port Translation): Network Address and Port
Translation (NAPT) identifies each internal host by TCP/UDP port number or by
the identifier field of ICMP packets. Unless otherwise stated, the port numbers
of IP packets refer to the TCP/UDP port numbers or the identifier of ICMP
packets. NAPT can better utilize IP address resources by allowing more internal
hosts to access the Internet simultaneously.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 4/20
l VPN: Virtual Private Network (VPN) enables construction of private networks
over a shared public network by using multiple technologies, such as MPLS,
tunneling and encryption. Unless otherwise stated, the term VPN refers to Layer
3 VPN (BGP/MPLS VPN) in this document.
l ALG: Application Layer Gateway (ALG) provides address translation for some
special application layer protocol packets (such as ICMP destination
unreachable packets, FTP packets, and ILS packets). These application layer
protocols need to negotiate port numbers between client and server, and thus
the corresponding NAT entries are created based on the negotiation results; the
private IP addresses or port numbers are contained in the payload of such
protocol packets.
l EASY IP: Uses the IP address of an interface on the router as the public IP
address for translation through NAPT, to save IP address resources.
l FTP: The File Transfer Protocol (FTP) is used to transfer a file from a file
system to another.
l DNS: Domain Name System (DNS) is a distributed database used by TCP/IP
applications to translate domain names into IP addresses and provide email-
related routing information.
l ILS: Internet Location Service (ILS) is a dynamic directory service function
provided by Microsoft. Users can store and search dynamic information (such
as IP address) through ILS.
l FIB: Forwarding Information Base (FIB) stores the core data for Layer 3 packet
(IP packet) forwarding.
l ARP: The Address Resolution Protocol (ARP) is used to resolve an IP address
into a MAC address.
l NP: A Network Processor (NP) is a programmable, high-performance network
processor for handling packets.
2.2 Operation of NAT
2.2.1 Single Instance
1. NAT
NAT only translates IP addresses, as shown in 错误!未找到引用源。.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 5/20
Figure 1 NAT operation
NAT operates as follows:
(1) The NAT device receives a packet from the private host to the public host.
(2) The NAT device selects an unused public address from its address pool and
establishes corresponding NAT entries (both inbound and outbound).
(3) The NAT device uses the outbound NAT entry to translate the source private IP
address to the public address and sends the packet to the public host.
(4) After receiving a response packet from the public host, the NAT device uses the
inbound NAT entry to translate the destination public IP address to the private
address and sends the packet to the private host.
Note that NAT cannot solve IP address depletion effectively, and is not commonly
adopted in practice.
2. NAPT
NAPT translates both IP addresses and port numbers (or the identifier field of ICMP
messages) and can better utilize IP address resources, allowing more internal hosts
to access the Internet simultaneously. NAPT does not support non-TCP/UDP/ICMP
packets.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 6/20
Figure 2 NAPT process
As shown in the above figure, NAPT operates as follows:
(1) The NAT device receives a packet from the private host to the public host. (2) If the connection is new, the NAT device selects an unused IP address and a
port number from its address pool, and then creates corresponding NAT entries (both outbound and inbound).
(3) The NAT device uses the outbound entry to translate the source private IP address and port number to the public ones and sends the packet to the public host.
(4) After receiving a response packet from the public host, the NAT device uses the inbound NAPT entry to translate the destination IP address and port number to the private ones and sends the packet to the private host.
3. NAPT internal server
Normally, public hosts have no permission to access most private hosts, but they may
need to access some internal servers. The problem is that NAPT entries cannot be
dynamically generated when public hosts initiate connections to internal servers. To
solve this problem, you can configure NAT internal servers on the NAPT device, that
is, to configure mappings between public IP addresses/port numbers and private IP
addresses/port numbers.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 7/20
Figure 3 Operation of NAT internal server
As shown in the above figure, NAPT with an internal server configured operates as
follows:
(1) The NAT device receives a packet from the public host to the internal server. (2) The NAT device uses the outbound NAPT entry to translate the destination
public IP address and port number to the private ones and sends the packet to the internal server.
(3) After receiving a response from the internal server, the NAT device uses the inbound NAPT entry to translate the source private IP address and port number to the public ones, and sends the packet to the public host.
4. NAPT ALG
Some application layer protocols need to negotiate port numbers between client and
server, so that the server can initiate connections to the client using the negotiated
port numbers (such as the establishment of an FTP data channel). If the NAT device
knows nothing about the negotiation process, it cannot perform translation between
private IP address/port number and public IP address/port number, and thus the
server and client cannot access each other. NAT ALG can solve this problem. The
following takes FTP as an example to describe ALG operation.
There are two FTP modes, Common FTP and Passive FTP. In Common FTP mode,
the client specifies a port for the server to establish a connection. If the client resides
in a private network, the NAT device needs to use ALG to generate a NAT/NAPT
entry through which the server can access the client. In Passive FTP mode, the
server specifies a port for the client to establish a connection. If the server resides in
a private network, the NAT device also needs to use ALG for the client to access the
server. When a private client wants to access a public server in Passive FTP mode,
or when a public client wants to access a private server in Common FTP mode, the
connection is initiated from the private network and thus ALG need not be used.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 8/20
l Common FTP
If a private FTP client wants to access a public FTP server, two TCP connections
need to be established. One is the control connection (TCP port number 21 on the
server) which is used to forward control information, such as commands and
parameters; the other one is the data connection (TCP port number 20 on the server)
which is used to transmit files.
Figure 4 Common FTP mode
The client notifies its port number and IP address through the PORT command to the
FTP server over a control connection, and then the server initiates a TCP data
connection at port 20 to the specified IP address and port.
To allow the public server to access the private client, the corresponding NAT entries
need to be created on the NAT device. To do so, the NAT device monitors the control
flow between the client and the server. It uses the private IP address and port number
in the received PORT command to create NAT/NAPT entries, and replaces them with
the corresponding public ones in the PORT command.
l Passive FTP
In Passive FTP mode, both the control and data connections are initiated by the client.
The client sends a PASV request through the control channel to tell the server that it
will use the Passive FTP mode. Then, the client uses a port above 1023 to transmit
data to the server at a port dynamically assigned, which may not be 20.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 9/20
Figure 5 Passive FTP mode
In the PASV request, the client notifies the server to use a specified data port (not the
default data port). Then, the server sends a response containing the port number and
its IP address, and waits for the client to initiate a connection.
The PASV response sent from the server is:
227 Entering Passive Mode. A1,A2,A3,A4,a1,a2
In this message, 227 represents the PASV response code; A1,A2,A3,A4 represents
the server IP address; (a1*256 +a2) is the port number of the server, which has the
same format as that of the PORT command.
If the server resides in a private network, the NAPT entries need to be created on the
NAT device for the public client to access the private server. To do so, the NAT
device uses the private IP address and port number in the PASV response received
from the server to create the corresponding NAT/NAPT entries, and replace the
private ones in the PASV response with the public ones through the entry.
2.2.2 Multi-Instance
1. NAT
NAT multi-instance extends NAT single-instance to support VPN address translation,
ensuring the same private IP addresses used in different VPNs are translated into
different public IP addresses. See Figure 6 for the translation process of NAT multi-
instance.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 10/20
Figure 6 NAT process
NAT multi-instance operates as follows:
(1) As shown in the above figure, the NAT device receives a packet from a private
host to a public host. (2) If it is the first time that the private host accesses the public network, the NAT
device selects an unused public IP address from its address pool and establishes corresponding NAT entries (both inbound and outbound), containing the VPN name, source private IP address, and assigned public IP address.
(3) The NAT device uses the outbound NAT entry to translate the source private IP address into the public one and sends the packet to the public host.
(4) After receiving a response from the public host, the NAT device uses the inbound NAT entry to translate the destination public IP address into the private one and forwards the packet to the private host in the corresponding VPN.
2. NAPT
NAPT multi-instance extends NAPT single-instance to support VPN address
translation.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 11/20
Figure 7 NAPT
NAPT multi-instance operates as follows:
(1) As shown in the above figure, the NAT device receives a packet from a private
host to a public host. (2) If it is a new connection from the private network, the NAT device selects an
unused IP address and a port number from its address pool, and then creates corresponding NAT entries (both outbound and inbound), containing the VPN name, private IP address/port number, and assigned public IP address/port number.
(3) The NAT device uses the outbound NAPT entry to translate the private IP address and port number into public ones and sends the packet to the public host.
(4) After receiving a response packet from the public host, the NAT device uses the inbound NAPT entry to translate the destination public IP address and port number into private ones and forwards the packet to the private host in the corresponding VPN.
3. NAPT multi-instance internal server
The internal servers in NAPT multi-instance provide VPN support at the private
network side. The operation process is similar to that of the single instance.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 12/20
Figure 8 Operation of NAT internal server
NAPT multi-instance with a internal server configured operates as follows:
(1) As shown in the above figure, the NAT device receives a packet from the public
host to the internal server. (2) The NAT device uses the outbound NAPT entry to translate the destination
public IP address and port number to the private ones and sends the packet to the internal server in the corresponding VPN.
(3) After receiving a response from the internal server, the NAT device uses the inbound NAPT entry to translate the source private IP address and port number to the public ones, and sends the packet to the public host.
4. NAPT multi-instance ALG
NAT multi-instance ALG extends NAT single-instance ALG to support VPN.
3 Application Scenarios
3.1 Common POP Network
The fast expansion of the Internet results in shortage of IPv4 addresses. Therefore,
NAT is used on high-end routers and core switches in large-sized enterprise and
metropolitan-area networks, to facilitate network maintenance and management.
3.1 Figure 9 shows a common point of presence (POP) network.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 13/20
Figure 9 Common POP network
3.2 Multi-ISP Network Using Policy-Based Routing
A private network may connect to multiple ISPs, as shown in 错误!未找到引用源。.
With policy-based routing (PBR) configured on the NAT device, hosts in network
10.8.1.0/24 can access the Internet through ISP 1, and hosts in network 10.8.2.0/24
can access the Internet through ISP 2. Configure address pool 1 (205.113.48.1
through 205.113.48.3) and address pool 2 (207.36.64.1 through 207.36.64.3) on the
NAT device. Address pool 1 belongs to ISP 1 and address pool 2 belongs to ISP 2.
When accessing the Internet, hosts in network 10.8.1.0/24 use the IP addresses in
address pool 1, and hosts in network 10.8.2.0/24 use the IP addresses in address
pool 2. Thus, hosts in different private network segments can access the Internet
through different ISPs, and can be provided with differentiated services.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 14/20
Internat
ISP 1 ISP 2
10.8.1.0/24 10.8.2.0/24
NAT 1 Address Group205.113.48.1~205.113.48.3
NAT 2 Address Group207.36.64.1~207.36.64.3
Figure 10 Multi-ISP network using policy-based routing
3.3 Multi-Instance VPN-Public NAT
As shown in the following figure, each PE has its own address pool for NAT
translation and supports MPLS encapsulation.
This networking mode mainly applies to enterprises, where users can assign IP
addresses independently.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 15/20
Figure 11 Multi-instance NAT
The hosts of each VPN can access the Internet through NAT configured on the local
PE. When receiving a packet from a CE, the corresponding PE matches it against the
configured ACL to determine whether it is destined to the Internet. If so, the PE
translates the source IP address, adds a public MPLS tag, and sends the packet out.
If the packet is destined to a host in another site of the same VPN, the PE
encapsulates the corresponding private and public tags and sends it out. In this way,
hosts in different sites of a VPN can access each other and the Internet over a
common link without any interference.
3.4 Multi-Instance VPN-VPN NAT
The same private IP addresses can be used in different branches of a government
network or an enterprise network. Besides accessing the Internet, VPN users may
need to access an authorized server that is usually placed in a VPN for security.
Other VPNs use RT to control the access to the server.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 16/20
Figure 12 Multi-instance NAT
As shown in Figure 12 , hosts in VPN 1 and VPN 2 need to access the Internet, as
well as a public server in the VPN named Server. To implement this application,
configure NAT on the PEs connected to VPN 1 and VPN 2 respectively, and
configure ACL rules to achieve address translation for packets from VPN 1 and VPN
2 to the Internet and VPN Server. Communication between different sites of VPN 1 is
enabled through Layer 3 forwarding.
4 H3C S9500 Characteristics
4.1 Overview
4.1.1 Use of Network Processor
The S9500 uses NAT boards to implement NAT functions. Because one S9500 can
have multiple NAT boards, you need to specify the NAT board number when
configuring a NAT entry or an internal server. A NP is used as the core packet
processing chip on a NAT board. The NP is programmable and scalable to provide
flexible services.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 17/20
4.1.2 Large Capacity, High Performance
Since the high-performance NP is used to process data packets, NAT of the S9500
features large NAT table capacity and powerful processing capabilities. The NAT
table can accommodate a maximum of 1.2 M NAT entries, the rate of link setup can
reach 150 Kpps, and the bidirectional packet forwarding rate through NAT can reach
3.0 Mpps. Suppose each packet is 64 bytes long, and the bidirectional translation rate
can reach 1.5 Gbps. Because NP is on a per packet basis, longer packets can
promote forwarding performance.
4.1.3 Support for Access to Internal Servers
Through configuring an internal server (a mapping between private IP address/port
number and public IP address/port number) on the NAT device, you can allow public
hosts to access the internal server in a private network. Additionally, the H3C S9500
supports the AnyServer feature, which enables public hosts to access any port of a
protocol on the internal server (ICMP does not use ports). This helps simplify internal
server configuration on the NAT device.
4.1.4 Support for Static Address Translation
Through static address translation, a private address can be mapped to a fixed public
address. Thus, hosts in the private network can access public networks using a fixed
public address. In addition, static address translation supports point-to-point
applications by enabling a public host to directly access a private host.
4.1.5 Rich ALG Features
S9500 uses software to implement NAT ALG for packets. The S9500 NAT ALG
functions support FTP, TFTP, DNS, ICMP time exceeded/unreachable messages,
LDAP, MSN Messenger 7.0 voice/video, and other commonly used application
software.
4.1.6 Blacklist Function
To prevent a private host from excessively occupying public network bandwidth, you
can limit its total network connections using a NAT blacklist based on link setup rate,
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 18/20
number of connections, or both.
To limit the number of connections, you can set a threshold value. Then, if the
number of connections established from a user exceeds this value, the user is added
into the blacklist and cannot establish new connections. When the existing NAT
entries of the user have aged out, the NAT device waits 30 seconds and then
removes the user from the blacklist to allow the user to establish new connections.
To limit the link setup rate, the token bucket in the standard single-rate color-blind
mode is adopted. If a private host’s link setup rate exceeds the CIR, it is added into
the blacklist and cannot establish new connections. The user is removed from the
blacklist and can establish new connections when the link setup rate decreases to a
value that makes enough tokens available in the token bucket.
4.1.7 Logging function
NAT entries can be logged to a server when they are established and aged out and
when they exceeds the specified active time. NAT logging configuration items include
enabling of logging, log version, source and destination IP addresses, source and
destination port numbers, and logging mode (flow-begin, logs sending interval for
active flows). When enabling logging, you can specify a configured ACL to determine
which packets needs to be logged.
4.1.8 Support for VPN Users
Traditionally, when two VPNs use the same public IP address to access the Internet
through NAT, address conflicts will occur and packets returned from the public
network cannot be sent to the correct VPN. The S9500 NAT multi-instance feature
adds VPN information into NAT entries, allowing multiple VPNs to access the Internet
through a common NAT device without affecting each other.
MPLS and IP networks are also supported to provide various networking modes for
ISPs.
4.1.9 Limit to the Numbers of Users and Connections Within a VPN
If multiple enterprises (VPNs) want to access the Internet through a common NAT
device, you need to specify the maximum numbers of users and connections of each
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 19/20
VPN to prevent any enterprise from excessively using address resources.
4.1.10 NAT for Inter-VPN Communication
In traditional MPLS VPNs, one VPN can access another VPN through RT. However,
if the two VPNs use the same private IP addresses, address conflicts will occur, and
therefore inter-VPN communication cannot be implemented only by using RT. To
satisfy this requirement, the H3C S9500 can translate VPN private IP addresses into
the IP addresses in the NAT address pool.
4.2 NAT Operation Process of the H3C S9500
4.2.1 NAT Single Instance
1. Outbound operation process
(1) Look up the NAT entries. If a match is found, go to Step 3. (2) Match the packet against the configured ACL to determine whether to perform
NAT. If not, the packet is forwarded; if yes, select an address from the address pool. A port is also selected for NAPT translation.
(3) Translate the source IP address and port number of the packet. If ALG processing is needed, the packet is processed by NAT ALG.
(4) Look up the FIB table to forward the packet.
2. Inbound operation process
(1) Look up the NAT entries. If a match is found, go to the next step; otherwise, the packet is discarded
(2) Translate the destination IP address and port number of the packet. If ALG processing is needed, the packet is processed by NAT ALG.
(3) Look up the FIB table based on the translated private IP address and forward the packet.
4.2.2 NAT Multi-Instance
NAT multi-instance extends NAT single instance by supporting VPNs. When a private host accesses a public host, NAT multi-instance creates a NAT/NAPT entry, which includes the VPN information. Thus, hosts in different VPNs can use the same private IP addresses. A packet returned from the public network matches the corresponding NAT entry and is forwarded to the VPN specified in the entry.
S9500 NAT Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 20/20
Copyright ©2007 Hangzhou H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of
Hangzhou H3C Technologies Co., Ltd.
The information in this document is subject to change without notice.
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 1/10
H3C S9500 Network Security Technology White Paper
Keywords: Network security, threat
Abstract: With the emergence of more and more network based critical services, security
problems are drawing more and more attention, making network security research a
hotspot in both the computer and telecommunications fields. This document describes
commonly known network attacks and introduces the network security features of the
H3C S9500 series.
Acronyms:
Acronym Full spelling
DoS Denial of Service
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 2/10
Table of Contents
1 Overview .................................................................................................................................. 3
2 Network Security Threats ........................................................................................................ 3
2.1 Definition of Network Security Threat............................................................................ 3
2.2 Classification of Network Security Threats ................................................................... 3
2.3 Security Threats to Network Devices ............................................................................ 4
2.3.1 Threats at Data Transport Level ......................................................................... 4
2.3.2 Threats at Signaling Level .................................................................................. 4
2.3.3 Threats at Device Management Level ................................................................ 4
3 Security Capabilities of H3C S9500 ........................................................................................ 4
3.1 Security Features at Data Transport Level ................................................................... 5
3.1.1 Defense Against Address Scanning.................................................................... 5
3.1.2 Defense Against DoS/DDoS Attacks .................................................................. 5
3.1.3 Broadcast/Multicast Rate Limit ........................................................................... 6
3.1.4 Defense Against MAC Address Table Capacity Attacks ..................................... 6
3.1.5 Support for Static MAC Address Entries and ARP Entries ................................. 6
3.1.6 Powerful ACL Capabilities................................................................................... 7
3.2 Security Features at Signaling Level ............................................................................ 7
3.2.1 Defense Against ARP Attacks ............................................................................. 7
3.2.2 Address Conflicts Detection ................................................................................ 7
3.2.3 Defense Against TC/TCN Attacks ....................................................................... 8
3.2.4 Defense Against Address Embezzlement........................................................... 9
3.2.5 Defense Against Routing Protocol Attacks ......................................................... 9
3.3 Security Features at Device Management Level .......................................................... 9
3.3.1 Support for User Levels ...................................................................................... 9
3.3.2 Secure Remote Management ............................................................................. 9
3.3.3 Security Auditing ............................................................................................... 10
3.3.4 Access Control .................................................................................................. 10
3.3.5 SFTP Service .................................................................................................... 10
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 3/10
1 Overview With the evolvement of Internet technologies and the explosive growth of the Internet
scale, Internet applications, starting from science research fields, have now reached
every walk of life. More and more network based critical services are emerging and
networks have become the new drive for improvement of productivity and life quality.
However, the Internet is based on IP, and therefore has inherent problems such as
security, quality of service, and operation mode, with security the most outstanding
and important problem. In addition, the openness of IP networks makes the security
problem even more complicated.
While the simplicity and openness of IP networks boost the rapid development of the
Internet, they also result in security vulnerabilities. Meanwhile, with the development
of technologies and the acceleration of information delivery, technical difficulty in
launching attacks to IP networks is falling and attack tools are becoming more
automatic, enabling more people to launch attacks. The number of network attack
events increases every year and the resulting economic cost is becoming higher and
higher. Network security threats are not only disturbing corporations, but endangering
the national information security, casting a shadow on the development of the Internet.
2 Network Security Threats
2.1 Definition of Network Security Threat
Network security threat refers to destruction and unauthorized access and
modification of data that is saved or transferred on networks, servers, and desktops.
Network security threats are usually implemented by specific technologies or tools
and are challenges to network security.
2.2 Classification of Network Security Threats
Security threats on IP networks fall into two categories: those to the security of hosts
(including user’s hosts and application servers) and those to the security of networks,
mainly network devices such as routers and switches. The former generally attack
specific operating systems, primarily the Windows systems. Examples include viruses
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 4/10
and Trojan horses. The latter mainly attack TCP/IP protocols. This white paper
discusses the latter, namely security threats to network devices.
2.3 Security Threats to Network Devices
Network devices provide three levels of functions: data transport level, signaling level,
and network device management level. According, this section describes security
threats to network devices in these three levels.
2.3.1 Threats at Data Transport Level
The network data transport level is responsible for processing and forwarding of data
entering a network device. Functions of this level may be affected by two types of
attacks:
l Attacks based on high traffic or abnormal packets, which are intended to
consume large quantity of CPU resources so that normal traffic cannot be
serviced.
l User data targeted attacks, which compromise the confidentiality and integrity of
user data by sniffing, tampering, or deleting user data.
2.3.2 Threats at Signaling Level
The signaling level maintains operation of network protocols to control routing and
switching of packets. Routing information sniffing and IP address forging are the main
threats at this level. These threats may cause routing information leakage and abuse.
2.3.3 Threats at Device Management Level
The device management level supports remote management of network devices.
Threats at this level come from two aspects: one is the vulnerabilities of the protocols
(such as Telnet and HTTP) for device management, and the other is management
defects such as the leakage of a management account.
3 Security Capabilities of H3C S9500 The H3C S9500 series are high-end routing switches that are based on the Comware
software platform. They not only hold all the security features of the Comware
platform, but also incorporate some other security features.
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 5/10
3.1 Security Features at Data Transport Level
3.1.1 Defense Against Address Scanning
When launching an address scanning attack, an attacker sends to a target network a
large quantity of IP packets with different destination IP addresses. In this case, the
network device connecting the target network has to send a great deal of ARP
packets for delivering of the attack packets. If no host is present with the destination
address of an attack packet, the network device has to send destination unreachable
notifications as well. When the target network has many hosts and the attack packets
are in great quantities, the CPU and memory resources of the network device may be
depleted, resulting in network service interruption.
The H3C S9500 series support defense against address scanning attacks. When an
H3C S9500 routing switch receives a packet destined for one of its directly connected
network segment, it checks whether an ARP entry is present for the destination
address. If not, it sends an ARP request and adds a drop entry for the destination
address to prevent subsequent packets to the address from impacting the CPU. If it
receives a response to the ARP request later, it removes the dropping entry and adds
an ARP entry. A drop entry expires after a specified period of time. This mechanism
can effectively block attack packets while allowing normal traffic.
The H3C S9500 series provide some configuration commands to enable/disable
defense against address scanning attacks.
3.1.2 Defense Against DoS/DDoS Attacks
During a denial of service (DoS) attack, an attacker sends large amounts of
connection requests to the target device to deplete the resources of the device,
making the device unable to function normally or even go down. DoS attacks usually
aim at servers, preventing servers from providing services for legal users. That is why
the attacks are called DoS attacks.
Distributed denial of service (DDoS) is an upgraded version of DoS. A DDoS attack
can compromise multiple devices at the same time and is more destructive in a
greater range.
The H3C S9500 series can better defend themselves against common DoS and
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 6/10
DDoS attacks such as Spoofing, Land, and Smurf, ensuring that when some protocol
is compromised, the others can function normally. Besides, when a server behind an
H3C S9500 routing switch is targeted by a DoS attack, the switch can assign specific
ACL rules to filter attack packets, so as to ensure that the connected server and hosts
can work normally.
3.1.3 Broadcast/Multicast Rate Limit
Broadcast and multicast packets in great quantities can consume a great deal of
network bandwidth and therefore degrade the forwarding performance of network
devices. When a loop exists on a network, broadcast and multicast packets may even
bring the network down.
The H3C S9500 series have powerful broadcast/multicast packet filtering functions.
Using these functions, you can set an absolute broadcast/multicast rate limit for a port
or a limit on the broadcast/multicast rate percentage. You can also configure ACL
rules to limit the rates at which broadcast packets, multicast packets, and unknown
unicast packets can pass a port.
3.1.4 Defense Against MAC Address Table Capacity Attacks
A MAC address table capacity attack sends a great deal of frames with different,
forged source MAC addresses to a target device, making the device learn a lot of
useless MAC addresses. As the capacity of a MAC address table is limited, the
device may not be able to learn MAC addresses of legal users normally. During Layer
2 forwarding, the attack packets may be broadcasted in the VLAN, wasting a lot of
bandwidth and impacting the hosts connected to the network device.
With the H3C S9500 series, you can set the maximum number of MAC addresses
that a port or VLAN can learn based on the number of hosts connected to the port or
VLAN, preventing a port or VLAN from using up all the MAC address table resources.
When setting the MAC address limit, you can also specify whether the device should
forward packets with unknown source MAC addresses when the limit is reached. This
allows you to prevent too much broadcast traffic in a VLAN from impacting other
devices.
3.1.5 Support for Static MAC Address Entries and ARP Entries
The H3C S9500 series support static MAC address entries and static ARP entries. By
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 7/10
configuring static MAC address entries, you can ensure the correct forwarding of
Layer 2 frames. By configuring static ARP entries, you can bind MAC addresses with
IP addresses, preventing IP addresses being embezzled.
3.1.6 Powerful ACL Capabilities
In complicated network environments, there may be kinds of attack packets
compromising network devices or the attached hosts.
The H3C S9500 series provide powerful ACL capabilities, allowing identification, limit
and filtering of packets based on the fields at data link layer, network layer, and
transport layer. The ACL rules can not only be based on common fields such as
ICMP, IGMP, TCP port number, UDP port number, IP address, and MAC address,
but can also be based on TTL, VLAN_ID, and EXP fields. In addition, you can
configure ACL rules for a device, a port, or a VLAN as required.
3.2 Security Features at Signaling Level
3.2.1 Defense Against ARP Attacks
The ARP protocol supports no authentication methods, although it is very important in
data forwarding. Attackers often use forged ARP packets to launch attacks. The H3C
S9500 series support defense against this kind of attack.
After an H3C S9500 routing switch receives an ARP packet, it hashes the source
MAC address of the packet. Besides, it counts the received ARP packets. When it
detects that the CPU is dropping packets and the number of ARP packets from a
MAC address exceeds the limit, it considers the host an ARP attacker and will log the
event, give an alert message, and add a source MAC address drop entry to filter
packets from the host.
3.2.2 Address Conflicts Detection
If the interface of a network device is using the same IP address as that of a host or
another network device which is connected with the interface, an address conflict
exits. In this case, if the network device cannot detect the address conflict, the ARP
entry for the network device on the other connected hosts may be updated to have a
wrong MAC address, disabling the hosts from communicating with the network device
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 8/10
normally.
The H3C S9500 series support address conflicts detection. When an H3C S9500
routing switch receives an ARP packet, it checks whether the source IP address of
the packet is the same as that of the interface connecting the network segment. If yes,
it sends an address conflict notification packet to tell the ARP packet sender that the
IP address has been used. At the same time, it sends a gratuitous ARP broadcast
packet, notifying all hosts and network devices on the segment to use the correct
ARP entry for the IP address. An address conflict alert message may also be
generated and logged, so that network administrators know the situation.
3.2.3 Defense Against TC/TCN Attacks
With Spanning Tree Protocol (STP) enabled, if a port of a device on the network
detects an STP state change, it generates a topology change (TC) or topology
change notification (TCN) message. When another device on the network receives
such a TC or TCN message and finds that the network topology has changed, it
needs to remove the MAC address and ARP entries to avoid using the entries for
data forwarding. If there are a lot of TC or TCN messages on a network, MAC
address and ARP entry flushing will occur frequently and large amounts of ARP
requests will then be broadcasted in the VLAN. In this case, Layer 3 packets may be
dropped and the network may not be able to function normally.
The H3C S9500 series can protect the network against TC/TCN attacks. Upon
receiving a TC/TCN packet, an H3C S9500 routing switch removes the MAC address
entries but does not remove the ARP entries. When relearning a MAC address, it
checks whether there is an ARP entry for the MAC address. If so, it directly modifies
the outbound port of the ARP entry. Modifying ARP entries based on MAC addresses
can avoid packet dropping during Layer 3 forwarding.
Frequent topology change may affect the operation stability of all devices on the
network. The H3C S9500 series can deal with this situation. After receiving the first
TC/TCN message, an H3C S9500 routing switch executes a series of processes
accordingly and starts a timer. Before the timer expires, it does not respond to any
more TC/TCN messages. Once the timer expires, it checks whether it has received
any TC/TCN messages during the period. If so, it performs the flushing operation.
This mechanism helps keep the devices working stably.
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 9/10
3.2.4 Defense Against Address Embezzlement
Address embezzlement refers to the situation where an illegal user exploits the IP
address of a legal user. In this case, the network device will learn a wrong ARP entry
and the legal user will not be able to get online normally.
The H3C S9500 series can protect users against address embezzlement attacks.
With MAC address and IP address bindings configured, an H3C S9500 routing switch
performs address validation when learning ARP entries and learns only legal ARP
entries.
3.2.5 Defense Against Routing Protocol Attacks
Routing protocol attacks send forged routing update packets to routers that do not
perform routing protocol authentication, populating the routing tables with forged
routes. This may even cause the networks to crash. Experienced attackers may
further launch more severe attacks.
The H3C S9500 series support routing protocol authentication:
(1) OSPF: Plaintext/MD5 authentication between neighboring routers and
plaintext/MD5 authentication within an OSPF area.
(2) IS-IS: Level-1 plaintext/MD5 authentication between interfaces, Level-2
plaintext/MD5 authentication of interfaces, plaintext/MD5 authentication within
an IS-IS area, plaintext/MD5 authentication in an IS-IS routing domain.
(3) BGP: MD5 authentication between neighboring routers and within a BGP area.
(4) RIPv2: Plaintext/MD5 authentication between neighboring routers.
3.3 Security Features at Device Management Level
3.3.1 Support for User Levels
The H3C S9500 series provide four user levels (visit, monitor, system, and manage)
and support encryption of user passwords and limit of password attempts. If a user
cannot enter the correct password before the limit is reached, the device will give an
alert message.
3.3.2 Secure Remote Management
The H3C S9500 series support the SSH protocol. Network administrators can log in
S9500 Network Security Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 10/10
to a network device by SSH securely.
3.3.3 Security Auditing
The H3C S9500 series provide basic security auditing functions including security
alarm logging and user operation logging.
3.3.4 Access Control
The H3C S9500 series support 802.1x authentication in port-based mode and MAC-
based mode, guaranteeing secure LAN access.
3.3.5 SFTP Service
Secure FTP (SFTP) allows users to log in to devices and perform remote file
management securely. An H3C S9500 routing switch can function as an SFTP server
or client. When it functions as a client, you can log in to a remote device from it to
perform file management.
Copyright ©2007 Hangzhou H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of
Hangzhou H3C Technologies Co., Ltd.
The information in this document is subject to change without notice.
S9500 OSPF/IS-IS/BGP GR Technology White
H3C S9500 OSPF/IS-IS/BGP GR Technology White
Paper V1.00
Keywords: GR
Abstract: Graceful Restart (GR) ensures continuity of packet forwarding and hence key services
when the routing protocol restarts. GR is a highly reliable technology widely used in
active-standby switchover and system upgrade.
Acronyms:
Acronym Full spelling
OSPF Open Shortest Path First
ISIS Intermediate System-to-Intermediate System
BGP Border Gateway Protocol
GR Graceful Restart
S9500 OSPF/IS-IS/BGP GR Technology White
Table of Contents
1 Introduction ............................................................................................................................... 3
2 Typical Networking Analysis ...................................................................................................... 3
3 Features.................................................................................................................................... 4
3.1 Terms ............................................................................................................................. 4
3.2 How GR Works ............................................................................................................... 5
3.3 OSPF GR ....................................................................................................................... 5
3.3.1 Standard OSPF GR .................................................................错误!未定义书签。
3.3.2 Compatibility mode ............................................................................................... 9
3.4 ISIS GR ........................................................................................................................ 13
3.5 BGP GR........................................................................................................................ 16
4 H3C S9500 Features............................................................................................................... 18
S9500 OSPF/IS-IS/BGP GR Technology White
1 Introduction The control plane and forwarding plane of a high-end router/switch are separate from
each other. The control plan controls and manages the whole device, discovering
routes and delivering routes to the interface boards. The forwarding plane is
dedicated to data forwarding. The respective processors of these two planes are
functionally independent.
Each time the control plane restarts, all the routing protocols have to restart, the
neighbor relationships between the device and the adjacent devices have to be
rebuilt, and all the routing information databases have to be re-synchronized.
Neighbor relationship interruption triggers route recalculation on neighbors, causing
routing flaps and communication failures.
To solve this problem, IETF proposed a series of enhanced protocols for different
routing protocols, such as IS-IS, OSPF, BGP, and LDP respectively. With these
enhanced protocols, the original protocol operating flows are improved. When the
control plane restarts on a device, the device will notify its neighbors to temporarily
preserve its routing information and adjacency relationships with it. After the protocol
restarts, the neighbors will help the restarting device restore routing information in a
very short time. During the restart, no routing flaps occur, and packet forwarding on
the network remains normal. These enhanced protocols are so called “Graceful
Restart”.
GR ensures the continuity of packet forwarding and hence key services when the
routing protocol restarts. GR is widely used in active-standby switchover and system
upgrade.
2 Typical Networking Analysis GR generally works between neighbors, as shown in the following figure:
S9500 OSPF/IS-IS/BGP GR Technology White
Switch BGR Helper
Switch EGR Helper
Switch CGR Helper
Switch DGR Helper
Switch AGR Restarter
Figure 1 Typical GR network application
When its control software restarts, Switch A starts GR and notifies its neighbors
Switch B, Switch C, Switch D, and Switch E to start GR. During the GR process,
Switch A finishes synchronizing routing information with its neighbors and the
forwarding services remain uninterrupted.
3 Features
3.1 Terms
GR Restarter
A GR Restarter is a device whose control plane restarts.
GR Helper
A GR Helper is a neighbor device that assists the GR Restarter in synchronizing
routing information during the GR process.
The GR Restarter and GR Helpers must be GR-capable and perform GR capability
negotiation in advance, including GR capabilities and GR time. If the negotiation
succeeds, when the control plane of a GR-capable device restarts, the neighbor
devices is notified to become GR Helpers and the routes of the GR Restarter remain
S9500 OSPF/IS-IS/BGP GR Technology White
unchanged within the GR time.
3.2 How GR Works
GR works with different routing protocols in the similar way, though the GR flows for
respective protocols vary.
In the following figure, the solid lines indicate that adjacencies are formed between
Switch A and Switch B, and between Switch A and Switch C, while the dotted lines
indicate that Switch A, Switch B, and Switch C are GR-capable and GR capability
negotiation has been complete among them. When its control plane restarts, Switch
A begins to work as a GR Restarter and its forwarding plane remains normal. Switch
B and Switch C begin to work as GR Helpers, with the routes of the GR Restarter
unchanged. Then, the GR Restarter (Switch A) reestablishes neighbor relationships
with the two GR Helpers (Switch B and Switch C) and receives routing information
from them. When the GR Restarter finishes receiving all the routing information, it
calculates the routes and synchronizes the calculation results to the forwarding plane.
After that, the GR process is complete.
Switch BGR Helper
Switch CGR Helper
Switch AGR Restarter
Figure 2 Typical GR networking
This GR process is generic to routing protocols. GR processing details vary with
routing protocols. The following sections describe the GR processing mechanisms for
OSPF, IS-IS, and BGP respectively.
3.3 OSPF GR
OSPF GR has two modes, standard mode and compatible modes.
S9500 OSPF/IS-IS/BGP GR Technology White 3.3.1 Standard OSPF GR
1. Packet format
Standard OSPF GR uses an Opaque-LSA (Type 9) to notify a neighbor device to
start the GR process. Known as the Grace-LSA, the Opaque-LSA has an Opaque
type of 3 and Opaque ID of 0. The following figure depicts the Grace LSA format.
Figure 3 Grace LSA format
The following figure shows its TLV format:
Type Length
Value
2 Bytes 2 Bytes
Type Length
Value
2 Bytes 2 Bytes
Figure 4 TLV format
The RFC defines three types of TLVs:
l Grace Period TLV
The Grace Period TLV has a Type value of 1 and Length value of 4, and indicates the
maximum time during which a neighbor acts as a GR Helper. If the GR Restarter has
not completed the GR process before this period expires, the neighbor device stops
S9500 OSPF/IS-IS/BGP GR Technology White
working as a GR Helper. Grace-LSAs must contain a Grace Period TLV.
l Graceful restart reason TLV
A Graceful Restart Reason TLV has a Type value of 2 and Length value of 1, and
describes the graceful restart reason. Possible values of the Value field are 0 for
unknown reason, 1 for software restart, and 2 for software reloading (upgrade).
Grace-LSAs must contain a Graceful Restart Reason TLV.
l IP interface address TLV
An IP interface Address TLV has a Type value of 3 and Length value of 4, and
indicates the IP address of the interface sending the Grace-LSA. This IP address
uniquely identifies the restarting device on a broadcast, NBMA, or P2MP network.
2. Protocol processing flow
Standard OSPF GR works as follows:
S9500 OSPF/IS-IS/BGP GR Technology White
GR Restarter GR Helper
Grace-LSA
ACK
1
2
HELLO3
DD LSU ACK
Grace-LSA
ACK
4
6
LSU7
5
Figure 5 RFC 3623 protocol processing flow
1) Once brought up again, an OSPF interface on the GR Restarter sends a Grace-
LSA.
2) Upon receiving the Grace-LSA, the neighbor starts to act as a GR Helper and
send an ACK to the GR Restarter.
3) Hello packets are exchanged on the broadcast or NBMA network to elect a DR
and BDR.
4) The GR Restarter begins normal LSDB synchronization. The neighbor state
transits from Exstart, through Exchange and Loading to Full. During this
process, the GR Restarter stores received self-originated LSAs, and labels them
as Stale.
S9500 OSPF/IS-IS/BGP GR Technology White
5) The GR Helper also begins normal LSDB synchronization. The neighbor state
transits from Exstart, through Exchange and Loading to Full. During this
transition process, the GR Helper operates as in the FULL state, without
generating any new Router LSAs or Network LSAs.
6) When all the neighbor relationships become FULL, namely, restored, Grace-LSA
flushing is initiated.
7) The GR process is complete, and new LSAs are generated and flooded. The
LSAs labeled as Stale but not regenerated are flushed.
3.3.2 Compatible OSPF GR
1. Packet format
l Link-local Signaling (LLS) Block
Compatible OSPF GR extends the OSPF packet format to carry different types of
application data. The following figure shows the extended OSPF packet format:
IP Header Length = HL+X+Y+Z
OSPF Header Length = X
OSPF Data
Authentication Data Length = Y
LLS Block Length =Z
Header Length
X
Y
Z
Figure 6 Extended OSPF packet format
The authentication data and LLS block fields are not included in the OSPF packet
length. Currently, only two types of OSPF packets, Type 1 (OSPF Hello) and Type 2
(OSPF DD), contain LLS Block, which is identified by the L Bit (0x10) in the Option
field.
* * D C L N / P M C E *
Figure 7 Option field of OSPF packet
S9500 OSPF/IS-IS/BGP GR Technology White
The LLS Block field adopts an extensible TLV structure, defining two types of TLVs:
Extended Options TLV (EO_TLV) and Cryptographic Authentication TLV (CA_TLV),
as show in the following figures.
Type 1
Length4
Extended Options
2 Bytes 2 Bytes
Figure 8 EO_TLV format
Type 2
Length20
Sequence Number
2 Bytes 2 Bytes
Auth Data
Figure 9 CA_TLV format
An EO_TLV has a Type value of 1 and a Value field with 4-byte Extended Options for
Option extension in OSPF packets.
l OOB
In traditional OSPF, LSDB resynchronization is performed only when neighbor
relationships are reestablished. Normal LSDB synchronization is carried out through
flooding after neighbor relationships are established. OOB (out-of-band) LSDB
resynchronization is carried out in a network where neighbor relationships have been
established and the network topology is stable.
S9500 OSPF/IS-IS/BGP GR Technology White
In traditional OSPF, LSDB resynchronization requires the neighbor state machine to
be in the Exstart state. This causes OSPF to generate new Type-1 LSAs (Router
LSAs), triggering route recalculation.
LR_Bit is introduced in the OOB flow for the OOB capability negotiation between
neighbors. LR_Bit is contained in the Extended Option in an EO_TLV. If the device is
OOB-capable, when sending OSPF Hello packets and DD packets, the device sets
the LR_Bit in the Extended Option of the EO_TLV to 0x00000001.
* * * * * * * * * * LR
Figure 10 LR_Bit
In addition, R_Bit is introduced in the OOB flow to notify neighbor devices to perform
OOB resynchronization. R_Bit is contained in the DD packets sent to neighbors. In
the DD packets, R_Bit and I/M/MS are set. This means the sender wants to start
OOB resynchronization. In this case, if the neighbor state machine is FULL, the
device sets the neighbor state to ExStart to start LSDB resynchronization.
During OOB resynchronization, the neighbor state is treated as FULL regardless of
whether the state is ExStart, Exchange, or Loading, this is, the device operates as if
the neighbor is in the FULL state, and therefore Router LSAs and Network LSAs do
not change, keeping the network stable.
l RS bit
In the compatible mode, an RS_Bit is added to the Extended Option of the EO_TLV
to notify the neighbor to start the GR process. The value of the RS_Bit is 0x00000002.
* * * * * * * * * RS LR
Figure 11 RS_Bit
2. Protocol processing flow
The following figure shows the processing flow of compatible OSPF GR:
S9500 OSPF/IS-IS/BGP GR Technology White
GR Restarter GR Helper
Hello LR = 1 RS = 1
Hello LR = 1 RS = 0
DD R = 1
DD R = 1
DD R = 1 LSU LSACK
1
2
3
4
5
6
Figure 12 Draft GR flow
1) Once brought up again, an OSPF interface of the GR Restarter sends a hello
packet containing LLS Block. The RS bit and LR bit in the Extended Options field
of the EO_TLV in the LLS Block are set.
2) Upon receiving the hello packet, the neighbor skips the two-way state, that is, it
keeps the neighbor state unchanged, enters the GR Helper process flow, and
sends back a hello packet with the LR bit on and RS bit off.
3) After receiving the hello packet with LR bit on, the GR Restarter sets the
neighbor state to 2-way and the subsequent flow is the same as that of the
traditional OSPF protocol. Once the DR election is complete, the first DD packet
(with R_bit on) is sent to start the OOB flow. In the hello packet sent after the DR
election, the RS_bit will not be set.
4) After receiving the DD packet with R_bit set and then setting the corresponding
neighbor state to Exstart, the GR Helper also enters the OOB flow.
S9500 OSPF/IS-IS/BGP GR Technology White
5) During LSDB resynchronization, the neighbor state transits from Exchange to
Loading and to Full. During this process, the GR Restarter stores the received
self-originated LSAs, and labels them as Stale.
6) When all the neighbor relationships become FULL and all the routing information
is restored, the GR process is complete. LSAs are regenerated and flooded, and
the LSAs labeled as Stale are not regenerated and are flushed directly.
3.4 ISIS GR
1. Packet format
In IS-IS GR, a new TLV, namely, Restart TLV, is added to IIH packets to notify the
neighbor device to enter the GR flow. This new TLV has a Type value of 211. The
following figure illustrates its Value field:
Flags
Remaining Time
Restarting Neighbor ID
1
2
ID Length
Figure 13 Value field of a Restart TLV
The one-byte Flags field records necessary state flags during the restart. The
following figure shows the Flags format:
* SA RRRA**** R RRA
Figure 14 Flags format
Currently, only the last three bits (SA, RA, and RR) are used. When the control
software restarts, the RR (Restart Request) bit of the first IIH packet sent through
each interface must be set. Upon receiving the IIH packet, the neighbor device must
acknowledge the receipt by sending back an IIH packet with the RA (Restart
Acknowledgement) bit set. The SA (Suppress adjacency advertisement) bit is
optional and used to avoid blackhole routes.
S9500 OSPF/IS-IS/BGP GR Technology White
The 2-byte Remaining Time field indicates the time in seconds before the neighbor
ages out. This field and the RA bit must be present at the same time. Upon receiving
an IIH packet with RR bit set from the restarting device, the neighbor device must
immediately acknowledges the receipt by sending back an IIH packet whose RA bit is
set to 1. In this acknowledge packet, the time in seconds before the corresponding
neighbor (restarting device) ages out is filled in the Remaining Time field.
The System ID of the restarting device is filled in the Restarting Neighbor System ID
field.
2. Protocol processing flow
In IS-IS GR, three timers, namely, T1, T2, and T3 are defined.
l T1 timer:
Like the IIH timer, the T1 timer is defined on each interface. It defines the interval for
sending IIH packets with the RR bit set and defaults to three seconds. When the
device restarts, a T1 timer is created on each interface and an interface periodically
sends IIH packets with RR bit set. The T1 timer on the interface is not removed until
the interface receives the IIH acknowledge packet with RA bit set and the complete
CSNP packet.
l T2 timer:
The T2 timer defines the maximum wait time of LSDB resynchronization and defaults
to 60 seconds. Each LSDB has such a timer.
l T3 timer:
The T3 timer defines the maximum restart time in IS-IS. Once the T3 timer expires,
the GR process ends regardless of whether the LSDB resynchronization is complete
and the normal IS-IS processing flow begins. Upon initialization, the T3 timer is set to
65535 seconds. After all interfaces receive the IIH acknowledge packets with the RA
bit set, the T3 timer is reset based on the minimum among the Remaining Time
values of these packets.
The following figure depicts the IS-IS GR working flow:
S9500 OSPF/IS-IS/BGP GR Technology White
IIH RR = 1 RA = 0
IIH RR = 0 RA = 1
CSN P
LSP
IIH RR = 0 RA = 0
1
2
3
4
5
T1Timer
T2TimerT3
Timer
GR Restarter GR Helper
Figure 15 IS-IS GR flow
1) When IS-IS is re-enabled on the GR Restarter, T2 and T3 timers are enabled
globally. When an interface is brought up again, the T1 timer is started on the
interface (Different from the original protocol flow, when the interface is up, the
T1 timer, instead of the IIH timer, is started), and an IIH packet with the RR bit
set is sent.
2) After receiving the IIH packet, the neighbor leaves the neighbor state of the
sender unchanged, and sends back an IIH packet with the RA bit set. The IIH
packet is filled with the GR Restarter’s age remaining time and System ID in the
Remaining Time and Restarting Neighbor System ID fields of the Restart TLV
respectively. If the interface is a broadcast interface, a DIS election is performed,
which is different from traditional IS-IS DIS election. If it is elected as the DIS, the
interface sends CSNP packets and all LSPs. If the interface is a P2P interface, it
directly sends CSNP packets and all the LSPs.
3) After receiving the IIH packet with RA bit set and all the CSNP packets, the GR
Restarter removes the T1 timer. Otherwise, the GR Restarter periodically sends
IIH packets with RR bit set and does not remove the T1 timer until it has received
the IIH packet with RA bit set and complete CSNP packets or when the
maximum number of T1 timer timeouts is reached.
S9500 OSPF/IS-IS/BGP GR Technology White
4) Once the GR Restarter finds that the LSDB resynchronization at a level is
complete, it removes the T2 timer of the level.
5) After removing all the T2 timers, the GR Restarter removes the T3 timer and
enters the normal IS-IS flow.
3.5 BGP GR
1. Packet format
Graceful Restart Capability
BGP GR defines a new BGP capability which is known as the Graceful Restart
capability and has a capability value of 64. The following figure shows is Value field.
Figure 16 BGP GR Capability Value
The R bit identifies Restart State. When it is set to 1, it means the sender is restarting
and the receiver can send routing information without needing to receive the End-of-
RIB marker from the sender. This prevents locking when multiple BGP speakers
await the End-of-RIB marker.
The Restart Time means the maximum route holdtime after the peer is found down.
The <AFI, SAFI, Flags for address family> fields indicate which network address
S9500 OSPF/IS-IS/BGP GR Technology White
families the GR feature supports. GR can support IPv4 and IPv6 at the same time.
In BGP GR, an End-of-RIB marker is defined to speed up the BGP GR process.
The Update message with both reachable NLRI and withdrawn NLRI as null is
designated as the End-of-RIB marker. After a BGP connection is established, this
marker can notify the peer that its initial notification is complete.
3.2.2.1 Protocol processing flow
The following figure depicts the BGP GR working flow:
BGP Open,GR 64,AF IPv4
BGP Open,GR 64,AF IPv4
BGP session OK
BGP Update message
BGP Open,GR 64,AF IPv4R=1, Time=180
End of RIB( IPv4)
BGP Update message
Switch A Switch B
BGP Open,GR 64,AF IPv4
BGP Open,GR 64,AF IPv4
BGP session OK
BGP Update message
BGP Open,GR 64,AF IPv4R=1, Time=180
End of RIB( IPv4)
BGP Update message
Switch A Switch B
Figure 17 BGP GR flow
S9500 OSPF/IS-IS/BGP GR Technology White
1) Switch A sends an Open message containing the IPv4 GR capability to the
neighbor.
2) The Open message sent by Switch B also contains the IPv4 GR capability.
3) Switch A restarts, and sends an Open message with R_bit set to request Switch
B to start the GR Helper processing flow; the maximum route holdtime in the
message is 180 seconds.
4) Upon receiving the Open message, Switch B starts the GR Helper processing
flow, labels all the IPv4 routes received from Switch A before as Stale, and holds
the routes for 180 seconds before deletion. Other flows are the same as those of
traditional BGP.
5) Optimal route selection is not performed during this process.
6) After sending all the Update messages, Switch B sends an End-of-RIB to notify
update completion.
7) After receiving all the routing information, Switch A performs optimal route
selection and resends Update messages to notify neighbor devise to update
routes.
8) Switch B deletes the routes labeled as stale.
4 H3C S9500 GR Characteristics The routing protocols running on the S9500 series switches have rich GR features
that allow excellent fault tolerance and compatibility. Each protocol can interoperate
with devices of other vendors. OSPF GR, in particular, supports both standard and
compatible modes and therefore is scalable in GR networking.
Copyright ©2007 Hangzhou H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of
Hangzhou H3C Technologies Co., Ltd.
The information in this document is subject to change without notice.
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 1/15
H3C S9500 QoS Technology White Paper
Key words: QoS, quality of service
Abstract: The Ethernet technology is widely applied currently. At present, Ethernet is the leading
technology in various independent local area networks (LANs), and many Ethernet LANs
have been part of the Internet. With the development of the Ethernet technology, most
common Internet users access the Internet through Ethernet. To implement end-to-end
QoS throughout the network, you must guarantee QoS for Ethernet. To do this, Ethernet
switching devices must use the QoS technology to provide different QoS guarantees for
different types of traffic flows, especially those traffic flows with higher demand for delay
and jitter guarantees.
Acronyms:
Acronym Full spelling
QoS Quality of Service
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 2/15
Table of Contents
1 Overview................................................................................................................................... 3
2 Basic Networking Structure........................................................................................................ 3
3 Features.................................................................................................................................... 4
3.1 Service Model ................................................................................................................. 4
3.2 Traffic Classification ........................................................................................................ 4
3.3 Traffic Policing................................................................................................................. 5
3.4 Priority Marking............................................................................................................... 6
3.5 Queue Scheduling........................................................................................................... 8
3.6 Congestion Avoidance................................................................................................... 10
3.7 Traffic Shaping.............................................................................................................. 12
3.8 Policy Routing............................................................................................................... 13
4 QoS Processing Procedure on the S9500 Series..................................................................... 14
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 3/15
1 Overview On traditional packet switching networks, switches and routers treat all packets equally and handle them using the first in first out (FIFO) policy. This service is called best-effort. It delivers packets to their destinations as possibly as it can, without any guarantee for delay and jitter.
With the development of computer networks, more and more traffic such as voice,
video, and critical data which is sensitive to bandwidth, delay, and jitter is transmitted
over networks. This enriches the services resources on a network greatly. On the
other hand, there is a higher demand for the Quality of Service (QoS) of network
transmission.
The Ethernet technology is widely applied currently. At present, Ethernet is the
leading technology in various dependent local area networks (LANs), and many
Ethernet LANs have been part of the Internet. With the development of the Ethernet
technology, most common Internet users access the Internet through Ethernet. To
implement end-to-end QoS throughout the network, you must guarantee QoS for
Ethernet. To do this, Ethernet switching devices must use the QoS technology to
provide different QoS guarantees for different types of traffic flows, especially those
traffic flows with higher demand for delay and jitter guarantees.
2 Basic Networking Structure
Figure 1 Basic networking structure
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 4/15
3 Features
3.1 Service Model
A service model refers to a set of end-to-end QoS functions. The simplest service
model is the Best-Effort model adopting the FIFO policy. It delivers packets to their
destinations as possibly as it can, without any guarantee for delay and jitter. The Diff-
Serv model was introduced to implement QoS for network transmission. The Diff-Serv
model is a multi-service model. It provides QoS services for each packet according to
the QoS parameters specified for the packet, thus satisfying differentiated QoS
demands. The Diff-Serv model is used to implement end-to-end QoS for some critical
services.
The S9500 series support the Diff-Serv model.
3.2 Traffic Classification
To specify different QoS parameters for packets of different levels, the Diff-Serv
model must classify the network traffic first. Traffic classification organizes packets
with different characteristics into different classes using classification rules. A
classification rule is a filter rule configured to meet your management requirements. It
can be very simple. For example, you can use a classification rule to identify traffic
with different priorities according to the ToS field in the IP packet header. It can be
very complicated too. For example, you can use a classification rule to identify the
packets according to the combination of link layer (Layer 2), network layer (Layer 3),
and transport layer (Layer 4) information including MAC addresses, IP protocol,
source addresses, destination addresses, port numbers of applications, and so on.
Generally, the traffic classification criterion is limited in the header of an encapsulated
packet. Contents of packets are rarely adopted for traffic classification.
The S9500 series support Layer 2, Layer 3, and Layer 4 ACL rules for traffic
classification. Such ACL rules can classify packets based on source MAC addresses,
destination MAC addresses, VLAN IDs, source IP addresses, destination IP
addresses, source TCP/UDP port numbers, destination TCP/UDP port numbers,
protocol types, IP precedence, ToS precedence, DSCP precedence, and whether
packets are fragmented.
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 5/15
3.3 Traffic Policing
To use limited network resources to provide customers with better services, you can
enable traffic policing on the incoming port for the traffic of the specified customers,
thus making the traffic adapt to the network resources assigned to it. Traffic policing
uses token buckets for traffic control.
Figure 2 Traffic policing
Figure 2 depicts the processing procedure of traffic policing. First, packets are
classified and the packets with the specified characteristics enter the token bucket for
processing. If the token bucket has enough tokens for sending the packets, the
packets can pass through; otherwise, the packets are dropped. In this way, you can
control the traffic of a certain class of packets.
The system puts tokens into the bucket at the set rate. You can set the capacity of
the token bucket. When the token bucket is full, the extra tokens will overflow and the
number of tokens in the bucket stops increasing. When the token bucket processes
packets, if it has enough tokens for sending these packets, the packets are sent, and
at the same time, the corresponding number of tokens are taken out of the bucket. If
the token bucket does not have enough tokens for sending these packets, these
packets are dropped. Therefore, the traffic rate is restricted under the rate of
generating tokens, thus implementing traffic control.
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 6/15
The S9500 series support traffic policing with the granularity of 8 kbps.
3.4 Priority Marking
Through marking different priorities for packets, you can identify the service levels of
different packets. The S9500 series can perform priority marking for specific packets.
ToS precedence, differentiated services codepoint (DSCP) precedence, and 802.1p
precedence can be marked. These priority types apply to different QoS models and
are defined in different models. The following part introduces IP precedence, ToS
precedence, DSCP precedence, 802.1p precedence, and EXP precedence.
I. IP precedence, ToS precedence, and DSCP precedence
Figure 3 IP precedence, ToS precedence, and DSCP precedence
As shown in Figure 3 , the ToS field of the IP header contains 8 bits: the first three
bits (0 to 2) represent IP precedence from 0 to 7; the following 4 bits (3 to 6)
represent a ToS value from 0 to 15. In RFC2474, the ToS field of the IP header is
redefined as the DS field, where a DiffServ code point (DSCP) precedence is
represented by the first 6 bits (0 to 5) and is in the range 0 to 63. The remaining 2 bits
(6 and 7) are reserved.
II. 802.1p precedence
802.1p precedence lies in Layer 2 packet headers and is applicable to occasions
where the Layer 3 packet header does not need analysis but QoS must be
guaranteed at Layer 2.
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 7/15
Figure 4 802.1Q Ethernet frame format
As shown in the figure above, each host supporting the 802.1Q protocol adds a 4-
byte 802.1Q tag header after the source address of the former Ethernet frame header
when sending the packet. The 4-byte 802.1Q tag header contains a 2-byte Tag
Protocol Identifier (TPID) whose value is 8100 and a 2-byte Tag Control Information
(TCI). TPID is a new class field defined by IEEE to indicate that the current packet is
802.1Q-tagged. Figure 5 describes the detailed contents of an 802.1Q tag header.
Figure 5 802.1p precedence
In the figure above, the 3-bit priority field in the TCI field is 802.1p priority in the range
of 0 to 7. The three bits specify the precedence of the frame. Eight precedence
values are used to determine which packets are sent preferentially when congestion
occurs. The precedence is called 802.1p precedence because applications related to
the precedence are defined in detail in the 802.1p specifications.
To provide differentiated services for VLAN VPN or QinQ frames, you must classify
frames by VLANs or 802.1p precedence in their inner VLAN tags. The inner VLAN
and 802.1p precedence of a packet determines its queue scheduling priority and drop
precedence. The 802.1p precedence of the inner VLAN tag of a packet determines
the scheduling priority and drop precedence of a packet at the egress.
Figure 6 802.1p precedence mapping
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 8/15
III. EXP precedence
Figure 7 MPLS label
In an Ethernet MPLS packet, there is a shim between the Layer 2 header and Layer 3
data. You can use the reserved fields in the shim, a 3-bit EXP to determine the
scheduling priority and drop precedence of the packet. You can classify MPLS
packets by their EXP precedence and determine the scheduling priority and drop
precedence of MPLS packets at the egress. You can map the DSCP precedence of
IP packets to the EXP precedence and use the EXP precedence to determine the
scheduling priority and drop precedence of MPLS packets at the egress.
Figure 8 EXP precedence marking
3.5 Queue Scheduling
When the network is congested, the problem that many packets compete for
resources must be solved, usually through queue scheduling. The S9500 series
support two queue scheduling algorithms: strict priority (SP), and weighted round
robin (WRR).
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 9/15
I. SP queue scheduling algorithm
Packets sent via this interface
Classifydequeue
High priority
Low priority
Queue 7
Queue 0
Packets sent
Sending queue
Queue 1
Queue 6
Queue 5~2
Figure 9 Diagram for SP queueing
SP queue scheduling algorithm is dedicated to critical service applications. The key
feature of mission-critical applications is that they require preferential service to
reduce the response delay when congestion occurs. Assume that there are eight
output queues on a port and the SP queueing classifies the eight output queues on
the port into eight classes, which are queue 7, queue 6, queue 5, queue 4, queue 3,
queue 2, queue 1, and queue 0 in the descending order of priority.
SP schedules the packets in a strict priority order. It sends the packets in the queue
of the highest priority first, and sends packets in a queue of a lower priority only when
the queue of a higher priority is empty. You can put critical service packets into the
queues with higher priority and put non-critical service (such as e-mail) packets into
the queues with lower priority. In this case, critical service packets are sent
preferentially and non-critical service packets are sent when critical service groups
are not sent.
The SP mechanism has its disadvantage. When congestion occurs and if high-priority
queues are occupied for a long time, the packets in the lower-priority queues are
“starved” before obtaining services.
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 10/15
II. WRR queue scheduling algorithm
A switch port supports eight output queues. WRR queue-scheduling algorithm
schedules all the queues in turn and every queue can be assured of a certain service
time. Assume there are eight priority queues on a port. WRR configures a weight
value for each queue, which is w7, w6, w5, w4, w3, w2, w1, and w0. The weight value
indicates the proportion of obtaining bandwidth. On a 100 M port, configure the
weight value of WRR queue-scheduling algorithm as 50, 30, 10, 10, 50, 30, 10, and
10 (corresponding to w7, w6, w5, w4, w3, w2, w1, and w0 in order). In this way, the
queue with the lowest priority can get 5 Mbps bandwidth at least, thus avoiding the
disadvantage of SP queue-scheduling that the packets in queues with lower priority
may not get service for a long time.
Another advantage of WRR queuing is that: though the queues are scheduled in
order, the service time for each queue is not fixed; that is to say, if a queue is empty,
the next queue will be scheduled. In this way, the bandwidth resources are made full
use.
3.6 Congestion Avoidance
When the network is congested, common network devices adopt tail drop to avoid
congestion. That is, when the queue length reaches the upper threshold, all the newly
arriving packets are dropped. However, if plenty of TCP traffic is dropped, which will
cause TCP timeout, the slow start and congestion avoidance mechanisms of TCP will
be triggered, thus reducing TCP traffic. If a queue drops packets of multiple TCP
sessions at the same time, slow start and congestion avoidance mechanisms will be
triggered for these TCP sessions at the same time. This is called global TCP
synchronization. In this case, these TCP sessions reduce the size of traffic sent to the
queue at the same time, so that the traffic sent to the queue is less than the
bandwidth of the queue, thus reducing the utilization of the line. On the other hand,
the size of the traffic sent to the queue is not stable but fluctuates between the
maximum bandwidth and a very small traffic size.
The S9500 series adopt the Weighted Random Early Detection (WRED) mechanism
to avoid global TCP synchronization. You can set the upper threshold and lower
threshold for a queue. When the queue length is smaller than the lower threshold, no
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 11/15
packet is dropped; when the queue length is between the lower threshold and the
lower threshold, WRED begins to drop packets randomly, and the drop probability
increases as the queue length increases; when the queue length is bigger than the
upper threshold, all newly arriving packets are dropped.
WRED drops packets randomly, thus avoiding global TCP synchronization. When the
sending rate of a TCP session slows down after its packets are dropped, the other
TCP sessions remain in high packet sending rates. In this way, some TCP sessions
remain in high packet sending rates in any case, and the link bandwidth can be fully
utilized.
If the current queue length is compared with the upper threshold and lower threshold
to determine the drop policy, bursty traffic is not fairly treated and proper data
transmission is affected. To solve this problem, WRED compares the average queue
size with the lower threshold and upper threshold to determine the drop policy. The
average queue size reflects the queue size change trend but is not sensitive to bursty
queue size changes, and thus bursty traffic can be fairly treated.
On a S9500 switch, you can set the exponential factor for average queue length
calculation, upper threshold, lower threshold, and drop probability for packets with
different precedence values respectively to provide differentiated drop policies.
When congestion occurs, the S9500 switch drops packets as soon as possible to
release queue resources and try not to assign packets to high-delay queues in order
to eliminate congestion.
A S9500 switch can assign drop levels to packets according to their 802.1p
precedence, that is, color the packets, or assign drop levels through priority marking.
The drop level can be 0 , 1, or 2, which represent green, yellow, and red respectively.
When congestion occurs, red packets are the first to be dropped, while green packets
are the last to be dropped.
You can set congestion avoidance parameters and thresholds for each queue and
each drop level.
The S9500 series support two drop algorithms:
l Tail drop: when packets are dropped, the drop policy for packets in a color (red,
yellow, or green, assigned according to drop levels) is determined by the
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 12/15
threshold set for the color. When the size of packets in a color (red, yellow, or
green) exceeds the corresponding upper threshold, the system beings to drop
newly arriving packets in this color.
l WRED drop algorithm: the drop levels are taken into account when packets are
dropped by queue. When the size of packets in a color (red, green, or yellow)
exceeds the lower threshold set for the color, the system begins to drop the
packets in the color between the upper threshold and lower threshold according
to a certain slope. When the size of packets in a color exceeds the upper
threshold set for the color, the system begins to drop all packets in the color
exceeding the upper threshold.
3.7 Traffic Shaping
Traffic shaping controls the rate of output traffic, so that the traffic can be sent out at
an even rate. Normally, traffic shaping is applied on a device to adapt its output rate
to the input rate of its connected downstream device so as to avoid unnecessary
packet drop and congestion. It differs from traffic policing mainly in that traffic shaping
buffers packets exceeding the rate limit so that packets are sent out at an even rate,
while traffic policing drops packets exceeding the rate limit. However, traffic shaping
introduces additional delay while traffic policing does not. The S9500 series support
port-based traffic shaping, that is, traffic shaping can be implemented to all traffic on a
port. It also supports queue-based traffic shaping on a port.
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 13/15
3.8 Policy Routing
Figure 10 Policy routing application scenario
The S9500 series can classify packets first and then configure traffic redirecting for a
certain class of packets to implement policy routing. As shown in Figure 10 , the
S9500 switch first classifies packets based on source IP addresses and destination
IP addresses to identify packets whose source IP addresses are private address
while whose destination IP address are public addresses. Then you can use policy
routing to redirect such packets to the NAT device for address translation and then to
the Internet.
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 14/15
4 QoS Processing Procedure on the S9500 Series
Figure 11 QoS processing procedure on the S9500 series
The S9500 series use traffic classification to classify traffic based on source MAC
addresses, destination MAC addresses, Ethernet types, VLANs, 802.1p priority, IP
protocol, source IP addresses, destination IP addresses, application port numbers,
ICMP packet types, IP precedence, ToS, DSCP, EXP, and VLAN IDs and 802.p
priorities in the inner VLAN tags of QinQ frames.
After classifying traffic into different classes, besides simply permitting a class of
packets to pass through or dropping a class of packets, the S9500 series provide a
policy control list (PCL) to perform the following actions for the traffic flows: traffic
policing, traffic accounting, marking QoS parameters (including 802.1p priority, DSCP,
EXP, and drop precedence), traffic mirroring, traffic redirecting, and specifying the
output queue.
After packets are marked with different drop levels through priority mapping, the
congestion avoidance module determines the drop policies for packets based on the
user-defined drop mode and the upper threshold and lower threshold set for each
color. With tail drop adopted, when the size of packets in a color (red, yellow, or green)
exceeds the upper threshold set for the color, the system begins to drop newly
arriving packets in the color. With WRED drop mode adopted, when the size of
packets in a color (red, green, or yellow) exceeds the lower threshold set for the color,
S9500 QoS Technology White Paper
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 15/15
the system begins to drop the packets between the upper threshold and lower
threshold according to a certain slope. When the size of packets in a color exceeds
the upper threshold set for the color, the system begins to drop all packets in the
color exceeding the upper threshold.
After congestion avoidance is completed, the packets permitted to be forwarded are
assigned to the corresponding queues. The queue scheduling module uses SP or
WRR queue scheduling algorithm to schedule packets. When forwarding packets, the
output port performs traffic shaping for outbound traffic based on the token bucket
size.
Copyright ©2007 Hangzhou H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of
Hangzhou H3C Technologies Co., Ltd.
The information in this document is subject to change without notice.
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 1/21
H3C S9500 RPR Technology White Paper
(Version 2.00)
Keywords: RPR
Abstract: RPR is a MAC layer technology designed for carrying large-capacity data services in
metropolitan area networks (MANs). It is physical layer independent, capable of running
over SONET/SDH, DWDM, and Ethernet. It can provide efficient, flexible network
solutions for broadband IP-based MAN carriers.
Acronyms:
Acronym Full spelling
DWDM Dense Wavelength Division Multiplexing
LSP Label Switched Path
MPLS Multi Protocol Label Switching
RPR Resilient Packet Ring
SDH Synchronous Digital Hierarchy
SONET Synchronous Optical Network
STP Spanning Tree Protocol
VRRP Virtual Router Redundancy Protocol (VRRP)
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 2/21
Table of Contents
1 Overview................................................................................................................................... 3
2 RPR Features ........................................................................................................................... 3
2.1 Concepts ........................................................................................................................ 3
2.1.2 Span ..................................................................................................................... 3
2.1.3 Edge..................................................................................................................... 3
2.1.4 Wrapping .............................................................................................................. 3
2.1.5 Steering ................................................................................................................ 4
2.1.6 Host...................................................................................................................... 4
2.1.7 Ringlet Selection Table.......................................................................................... 4
2.2 Protocol Processing Mechanism...................................................................................... 4
2.2.1 Data Operations on RPR Stations ......................................................................... 4
2.2.2 Efficient Bandwidth Use ........................................................................................ 6
2.2.3 Automatic Topology Discovery............................................................................... 7
2.2.4 Topology Protection and Self-Recovery................................................................. 7
2.2.5 Fairness Algorithm ................................................................................................ 9
2.2.6 QoS Guarantee................................................................................................... 10
2.2.7 Others................................................................................................................. 12
2.3 RPR Data Frame Format............................................................................................... 13
3 RPR Applications .................................................................................................................... 14
3.1 Layer 3 Application........................................................................................................ 14
3.2 Layer 2 Application........................................................................................................ 16
4 RPR Features on the S9500.................................................................................................... 18
4.1 Powerful Service Switching Performance ...................................................................... 18
4.2 Complete QoS Capabilities ........................................................................................... 18
4.3 Abundant Ring Selection Mechanisms........................................................................... 18
4.4 Layer 2 Bridging + L2 Tunneling.................................................................................... 19
4.5 Compatibility with Ethernet Protection Mechanisms....................................................... 19
4.6 Complete Clock Schemes ............................................................................................. 19
4.7 Ease of Configuration.................................................................................................... 20
4.8 RPR Implementation on the S9500 ............................................................................... 20
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 3/21
1 Overview Resilient packet ring (RPR) is a MAC layer technology standardized by the IEEE
802.17 workgroup. It is independent of the physical layer and can run on
SONET/SDH, fast Ethernet, and DWDM.
The RPR technology integrates high reliability of SDH self-recovery and Ethernet
advantages such as economics, high bandwidth, flexibility, and scalability. It
provides bandwidth management with data optimization and high performance
multi-service transmission on a ring topology.
2 RPR Features
2.1 Concepts
Figure 1 RPR ring topology
2.1.2 Span
In a bidirectional ring, the section between two adjacent stations is called a span.
A span comprises two bidirectional links.
2.1.3 Edge
A span on which data frames are not allowed to pass is called an edge. An edge
can result from fiber cut, signal attenuation, manual switch, or any other error or
protection action.
2.1.4 Wrapping
Wrapping is a protection mode of RPR. In wrapping mode, after a span or station
fails, protected traffic is directed at the point of failure to the opposing ringlet. The
two ringlets thus form a closed single ring around the point of the failure. As the
wrapping allows quick switchover without ringlet selection update, data frame
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 4/21
loss is minimized, but at the price of bandwidth.
2.1.5 Steering
Steering is another protection mode of RPR. Unlike in wrapping mode, in
steering mode, the RPR stations on the ringlet update the ringlet selection upon
detection of an edge. Based on the update result, the protected traffic is steered
to the newly selected ringlet. The steering mode thus avoids the bandwidth
waste with wrapping mode, but as it requires topology reconvergence, it can
cause frame loss and service interruption.
2.1.6 Host
For the purpose of this document, the upper layer of the RPR MAC layer is
referred to as the host. The host receives, processes, and transmits the traffic
destined for the local station.
2.1.7 Ringlet Selection Table
Each RPR station maintains a ringlet selection table, which includes information
such as ringlets and hops to reach other stations on the RPR ring.
When the ring is closed, two paths are available for reaching a destination, of
which the shortest one is selected by default.
2.2 Protocol Processing Mechanism
RPR is made up of dual unidirectional counter-rotating ringlets identified as
Ringlet 0 and Ringlet 1 and the links on the rings are operating at the same rate.
The two rings of RPR can transmit data at the same time.
Each RPR station is identified by a 48-bit MAC address as in Ethernet. The MAC
address of an RPR station on the RPR ring must be unique. The two physical
optical interfaces of an RPR station are regarded as a logical interface from the
perspective of the network layer and the link layer.
2.2.1 Data Operations on RPR Stations
Stations on an RPR ring handle data frames by performing the following
operations:
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 5/21
l Insert: to place a frame received from outside of the RPR ring onto a ringlet.
l Transit: to pass a frame to the next station. As the frame is expressly
forwarded at the RPR MAC layer, the throughput of the RPR station is
improved. For a multicast/unicast, the RPR station also sends a copy to the
upper layer.
l Copy: to deliver an inbound frame from the ring to the upper layer. The
copying of a frame does not imply its removal from the ring.
l Strip, to remove a frame from a ringlet. A station strips a frame if the frame
is destined for or sourced from the local station, or if the time to live (TTL)
value of the frame expires.
The following figure shows how a unicast data frame is transmitted on an RPR
ringlet.
Figure 2 Unicast traffic forwarding
As shown in the figure, the source station inserts the unicast frame into the data
stream on Ringlet 0 or Ringlet 1, the transit stations transit the frame, the
destination station copies and strips the frame.
For a multicast or broadcast frame, the stations on the RPR ring copy and transit
the frame. When the frame travels back to the source, the source station strips
the frame from the ring.
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 6/21
Figure 3 Multicast/broadcast traffic forwarding
2.2.2 Efficient Bandwidth Use
RPR allows efficient bandwidth use on a ring network:
l Destination stripping: Different from traditional ring technologies such as
SDH/SONET where a unicast frame is removed from the ring only after it
travels back to the source station, RPR adopts destination stripping where
a unicast frame is removed from the ring as soon as it reaches the
destination station.
l Spatial reuse: On an RPR ring, frame transmission on any one link is
independent of frame transmission on other links. By supporting concurrent
per-ringlet transmissions, the bandwidth available to the stations on a
ringlet exceeds the individual link capacity. On nonoverlapping segments,
concurrent transfers of independent traffic are allowed. On overlapping
segments, bandwidth allocated for traffic transfers is assigned based on a
bandwidth fairness algorithm.
l Automatic bandwidth allocation: Different from the complex static
bandwidth allocation with SDH, RPR supports bursty traffic, allowing fast
service deployment.
l No redundant bandwidth: Unlike SDH, RPR can transmit frames on both
ringlets without having to reserve bandwidth for protection purpose. With
RPR, the two ringlets back up each other to achieve self-recovery.
l Support for broadcast/multicast: For a broadcast or multicast, only one
copy travels on the ring. This broadcast/multicast frame is copied and
transited on each RPR station and stripped off from the ring when it travels
back to the source station.
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 7/21
l L2 rapid forwarding: As a station processes only the frames destined for it,
forwarding speed is improved.
2.2.3 Automatic Topology Discovery
Each station on an RPR ring uses topology and protection (TP) frames to
broadcast its topology and protection status information. After receiving the
information, other stations update their local topology databases, resulting in a
consistent topology database to be maintained on the ring.
When detecting a protection state change, a station sends eight TP frames at
intervals of 1 to 20 milliseconds (the default is 10 milliseconds). In addition, the
station sends TP frames periodically at intervals of 50 milliseconds to 10
seconds (the default is 100 milliseconds). This mechanism enables all stations
on the ring to get aware of protection and topology change timely and reliably,
ensuring timely protection switchover in addition to topology synchronization.
The automatic topology discovery mechanism of RPR enables an RPR station to
be plugged and play, allowing the station to get the ring topology and be sensed
by other stations automatically.
2.2.4 Topology Protection and Self-Recovery
Steered path
Station A
Station B
Figure 4 Path steering upon detection of a fault (A -> B)
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 8/21
Figure 5 Path wrapping upon detection of a fault (A -> B)
RPR can provide protection in response to fault, allowing services to recover
within 50 milliseconds. RPR provides two protection modes: steering and
wrapping.
In steering mode, a station, upon detection of a fault on a ringlet, broadcasts the
protection state change with TP frames on the ring. This also triggers ringlet
selection. When other stations receive the TP frames, they transit to the
corresponding protection state, recalculate the reachability of the stations on the
ring, and update their ringlet selection tables to select the ringlet that retains
connectivity to the destination stations.
Unlike the steering mode, the wrapping mode does not involve ringlet selection
update on the entire ring. Instead, the stations at both sides of a point of failure
transit to the wrapping mode upon detection of the failure while other stations
transmit traffic along the old path. When the protected traffic arrives at the station
on the one side of the point of failure, it is directed to the opposing healthy ringlet
to reach the station on the other side of the point of failure. Then the protected
traffic travels the original ringlet to reach the destination.
Compared with the steering mode, the wrapping mode provides quicker
protection resulting in less frame loss but requires more bandwidth. To benefit
from both, the RPR implementation of the S9500 adopts the wrap-then-steer
mode. In this mode, RPR starts the wrapping mode once a link fails to ensure
continuity of the ongoing service and switches to the steering mode after the
topology converges to save bandwidth.
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 9/21
2.2.5 Fairness Algorithm
Resources on a ring network are shared among the stations. RPR provides a
global fairness algorithm on the entire ring network to guarantee the fairness of
the sharing and improve bandwidth use efficiency. The fairness algorithm of RPR
can regulate traffic dynamically to minimize the likelihood of congestion and to
handle bursty large sized traffic effectively, thus ensuring normal use of the
network.
To achieve bandwidth allocation fairness, each RPR station monitors the use of
its bandwidth and provides an explicit backpressure mechanism between
stations. With this mechanism, the station notifies a source station of the current
available capability, having the source station regulate traffic transmission. Thus,
bandwidth allocation fairness is achieved on the ring.
The fairness algorithm of RPR involves the following three aspects:
l Determining the congestion threshold on a station
l Determining the broadcast rate to the upstream station
l Determining the traffic insertion rate on a station
When congestion occurs on a station, the station sends a congestion
advertisement on the ringlet opposite to the data transmission direction to
advertise a fair rate. Receiving the advertisement, the upstream station then
decreases the frame insertion rate down to the advertised fair rate. If congestion
also occurs on the current station, it does the same as its downstream station did.
Bandwidth management regulates low-priority data frames, but not high-priority
data frames or control frames to guarantee high-priority services. The bandwidth
management ability of RPR allows for bandwidth allocation efficiency and
fairness, which are impossible with Ethernet or other ring network technologies
where bandwidth management is not available.
The following figure illustrates how the fairness algorithm works on an RPR ring
comprising stations A, B, C, and D. Suppose the bandwidth of each link is 10
Gbps and traffic travels the outer ringlet.
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 10/21
Figure 6 Bandwidth fairness algorithm
The following is what occurs on the RPR ring:
(1) Both stations C and B send 4000 Mbps traffic to station D. They share
bandwidth on span C–D and represent 8-Gbps bandwidth in total. As the
link bandwidth is 10 Gbps, no congestion is present.
(2) Station A also sends 4000 Mbps traffic to station D. As a result, the total
traffic on span C–D reaches 12 Gbps, exceeding the maximum link
bandwidth (10 Gbps). Congestion thus occurs on span C–D.
(3) With the fairness algorithm, station C performs calculation immediately
after detecting the congestion to decrease the rate putting traffic onto the
ring to 2000 Mbps. At the same time, it sends control frames to station B
reversely along the inner ringlet to transmit congestion and fairness
algorithm information.
(4) Upon receiving the fairness control frames, station B immediately
decreases its traffic rate and sends fairness control frames to station A.
According to the fairness algorithm, both station C and station B adjust
traffic rate to 3000 Mbps.
(5) After receiving the control frames, station A does the same thing.
As a result of this process, stations A, B and C adjust their traffic rates to 3300
Mbps, sharing bandwidth fairly.
In this example, absolute fairness is maintained. RPR, however, allows exclusive
bandwidth allocation and weighted bandwidth allocation. Thus, traffic rate can be
different at each station depending on its fairness weight.
2.2.6 QoS Guarantee
The capabilities of 50 milliseconds self-recovery, efficient bandwidth use, and
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 11/21
advanced RPR-Fa algorithm enables RPR to provide good QoS guarantee for
services, achieving high reliability, large throughput, low delay, and low loss rate .
RPR services fall into three classes: class A, class B and class C, with
decreasing priorities.
l Class A: Provides low-jitter bandwidth guarantee to support TDM services.
It is subdivided into subclasses A0 and A1. For the A0 service, bandwidth is
reserved on the entire ring and the unused reserved bandwidth cannot be
used for lower priority services. For the subclass A1 and class B services,
bandwidth is reclaimable and the unused bandwidth can be used for lower
priority services.
l Class B: Provides low-delay bandwidth guarantee to transmit data in the
order of priority. Class B can be divided into two subclasses: committed
information rate (CIR) and excess information rate (EIR), that is, B-CIR and
B-EIR.
l Class C: Provides the best-effort service for traditional IP traffic.
RPR uses the Sc field to indicate the priority of an RPR frame.
For traffic to be forwarded, the RPR MAC layer adopts either a single transit
queue or dual transit queues. On a single-queue station, all services are put in a
first in first out (FIFO) queue regardless of their priorities. On a dual-queue
station, services are put either in the high-priority queue or in the low-priority
queue as follows:
l Class A service: put in the high-priority queue.
l Class B service: put in the low-priority queue. The subclass B-CIR service
has higher priority than the class C service and is not regulated by the
fairness algorithm. The subclass B-EIR service has the same priority as the
class C service and is regulated by the fairness algorithm.
l Class C service: put in the low-priority queue.
A committed information rate is allocated for the class B service. Traffic
conforming to the CIR has higher priority than that of the nonconforming traffic
(B-EIR traffic). The RPR MAC layer controls the traffic transmission order, which
differs with the queue model.
l On a dual-queue station
The RPR MAC layer assigns traffic sent by the host to the host queue and traffic
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 12/21
to be forward for other stations to the transit queues. The RPR MAC layer
dequeues frames from the transit queues in the following order:
(1) Frames in the high-priority transit queue.
(2) Class A frames so long as the low-priority transit queue is not full. In case
the length of the low-priority transit queue crosses a specified threshold,
the frames in the queue are sent.
(3) B-CIR frames.
(4) B-EIR and Class C frames, if the fairness algorithm is obeyed.
(5) Frames in the low-priority transit queue, if no higher priority frames are
waiting for transmission.
l On a single-queue station
Frames in the transit queue are transmitted first, regardless of the priority. The
class C and subclass B-EIR services are regulated by the fairness algorithm.
RPR supports bandwidth reservation, providing perfect QoS guarantee for
reserved bandwidth. Thus, transmission of the traditional voice service can be
implemented.
Because the fairness algorithm of RPR does not regulate the high-priority
services, all high-priority traffic will always be sent prior to low-priority traffic. To
prevent excessive high-priority traffic from affecting low-priority services, you are
recommended to set a threshold of high-priority traffic.
RPR provides multiple static traffic shaping methods such as rate limiting (using
a rate limiter) for high-priority and low-priority data frames. For low-priority data
frames, RPR also provides dynamic traffic shaping.
The QoS guarantee measures of RPR ensure that an excellent QoS guarantee
can be provided on an RPR ring even if the host does not provide QoS
guarantee.
2.2.7 Others
In addition to the features described in the preceding sections, RPR also delivers
features described in this section.
as a physical layer independent MAC layer protocol, RPR provides physical layer
interfaces for physical layer such as Ethernet, DWDM and SONET/SDH.
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 13/21
RPR allows for great bandwidth scalability. For example, your can scale RPR
ring bandwidth from 155 Mbps to 10 Gbps, and even to 40 Gbps.
A very important feature of RPR is that it avoids the N2 issue successfully to
achieve full connectivity at the MAC layer for N stations by using only N links.
Compared with SDH, POS and Ethernet, RPR has lower link cost.
RPR is an optimized Ethernet technology. It supports all Ethernet protocols and
services.
RPR supports equipment interoperability at the ring level. For example, you can
connect ATM devices, routers, and TDM devices to the same RPR ring. These
networks share the physical links and total bandwidth of the ring while being
transparent to each other.
RPR provides complete MIB features. This allows it to provide an extraordinary
operations and maintenance platform, achieving operability and manageability.
2.3 RPR Data Frame Format
Figure 7 RPR frame structure
There are two types of RPR frames: basic data frames and extended data
frames. Suppose station C sends data to station D on the RPR ring shown in the
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 14/21
above figure.
When stations A, B, C and D are located in the same VLAN, the data frames are
extended frames and are forwarded at Layer 2.
When stations C and D belong to a VLAN different from the one to which stations
A and B belong, the data frames are basic frames and are forwarded at Layer 3.
3 RPR Applications RPR stations can insert both Layer 3 traffic and Layer 2 traffic onto an RPR ring.
For Layer 3 RPR application, Layer 3 interfaces (VLAN interfaces on the S9500)
must be configured; for Layer 2 RPR application, RPR logical interfaces must
trunk the VLAN(s) to which Layer 2 data streams belong.
3.1 Layer 3 Application
Assign the RPR logical interfaces on a ring to the same VLAN, for which a VLAN
interface must be created on each RPR station. Assign the VLAN interfaces IP
addresses in the same network segment. Thus, from the perspective of the
service layer on a station, any other station on the ring is the direct next hop.
On an RPR station, service traffic arriving at other service boards is passed
through the VLAN interface to the RPR board where the service traffic is inserted
onto the RPR ring.
RPR supports various routing protocols. Protection switch due to fiber cut for
example does not result in route reconvergence or MPLS LSP re-establishment.
This is because two paths are available to a destination. When one fails, traffic
can travel the other to reach the destination. The protection switch may happen
as fast as within 50 milliseconds, far less than the hello interval of service layer
protocol neighbors. Moreover, the DOWN event of a physical port will not be
notified to the service protocol layer, unless both physical ports come down.
Therefore, using an RPR ring to connect distribution and access services can
provide the reliability and stability that are impossible with STP or other ring
network technologies where path switchover can result in update at the service
layer involving routing, MPLS LSP, ARP, MAC, and so on.
RPR rings support the Virtual Router Redundancy Protocol (VRRP). You can
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 15/21
achieve node protection by assigning two stations on an RPR ring to the same
VRRP group. Thus, when the master station fails or fiber cut occurs on the spans
of the station, the backup station takes over and advertises attribute discovery
(ATD) frames to direct the traffic destined for the VRRP group to it.
As shown in Figure 8 , the two common stations, also the egresses of the RPR
ring at the distribution layer, form a VRRP standby group, which is assigned a
virtual IP address. When the master fails, the backup station takes over to
forward traffic destined for this virtual IP address. The RPR module and the
VRRP module works together as follows to ensure that switchover occurs in the
VRRP group within 50 milliseconds:
Immediately after detecting that the master has disappeared from the RPR ring
through the fast RPR detection mechanism, the backup station issues the virtual
MAC of the VRRP group locally and advertises throughout the ring that it is now
the master of the VRRP group to direct traffic to it. The process occurs while the
VRRP module is handling the switchover situation.
Figure 8 Network diagram for a Layer 3 application with RPR
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 16/21
3.2 Layer 2 Application
When an RPR station receives a frame belonging to a VLAN carried on it, it
encapsulates the frame in extended RPR frame format and forwards the RPR
frame based on the MAC address onto the ring. Similar to Ethernet ports, RPR
ports can trunk multiple VLANs to provide multi-VLAN access, supporting both
Layer 2 and Layer 3 services.
RPR supports ring interconnect, including ring intersection, ring tangent, and ring
link. In ring intersection mode, two RPR rings are interconnected at a single
station installed with two RPR boards. In ring link mode, RPR stations are
connected with other boards, GE boards for example, to form a redundant
connection to ensure that a redundant path is available when fault occurs on
both RPR ringlets. In ring link mode and ring intersection mode, the likelihood of
layer 2 loop exists. You need to use some mechanism, STP on RPR ports for
example, to remove loops.
Figure 9 Network diagram for an RPR L2 application
You may configure VLAN-based tunnels on an RPR ring for transparent
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 17/21
transmission purpose. As shown in the above figure, you can tunnel VLAN 60
traffic from device A to device B by configuring a tunnel on their connecting RPR
stations. The tunnel created on either station must take the other station as the
destination. In addition, if the VLAN for which the tunnel is created contains only
one access port in addition to the RPR logical port, disable MAC address
learning in the VLAN.
As shown in the above figure, you may configure a tunnel on the RPR station
connected to device F to tunnel the traffic from device F to the egress on the 2.5
Gbps ring. As two egresses in redundancy are available, you can make
configuration as such that the tunnel can switch over to the backup egress
station when the primary one fails.
When tunnels are used on the RPR ring, the use of default ringlet selection can
cause a ringlet to be saturated. To address the problem, you can configure
VLAN-based ringlet selection to distribute traffic.
Tunneling is suitable for point-to-point service interconnection. For multi-point
service interconnection, use Layer 2 bridging. As shown in the above figure,
devices C, D and E are in VLAN 50. To bridge traffic between them through the
RPR ring, you must configure VLAN 50 on the corresponding RPR stations and
enable MAC address learning in the VLAN.
On the 10-Gbps RPR and 2.5-Gbps RPR rings, you can adopt VPLS plus RPR
to achieve multi-point interconnection.
Figure 10 VPLS tunneling on an RPR ring
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 18/21
As shown in the above figure, VPLS tunnels are established on the RPR ring to
implement L2 interconnection between A, B, C and D through MAC address
learning.
4 RPR Features on the S9500
4.1 Powerful Service Switching Performance
According to the RPR protocol, fault detection should be done within 10
milliseconds and service switching within 50 milliseconds. The RPR
implementation on the S9500 however can provide service switching as fast as
within 20 milliseconds, fully satisfying the carrier-class requirement and reflecting
the strength of RPR.
4.2 Complete QoS Capabilities
The RPR implementation on the S9500 provides complete QoS capabilities for
RPR traffic. It supports the access control list (ACL), rate limiting, traffic shaping,
queuing, and almost all QoS features available on Ethernet. It supports
mappings from COS, EXP and DSCP priorities to RPR priorities. Depending on
customer needs, it can support services of Class A, B and C, providing
bandwidth guarantee and other service differentiation capabilities. In addition,
the RPR implementation on the S9500 uses a fairness algorithm to ensure fair
access of stations to ring bandwidth, allowing for bandwidth use efficiency,
congestion avoidance and congestion pre-warning.
4.3 Abundant Ring Selection Mechanisms
The RPR implementation on the S9500 supports multiple ring selection modes.
By default, dynamic ringlet selection (shortest path selection) is adopted on an
RPR ring. Dynamic ringlet selection results in a ringlet selection table that
contains the shortest paths to other RPR stations on the ring upon topology
convergence. This ringlet selection table does not change when the topology is
stable.
In addition, you can configure static ringlet selection entries, which have higher
priority over dynamic entries. For Layer 2 tunneling, VLAN-based ringlet
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 19/21
selection is supported.
4.4 Layer 2 Bridging + L2 Tunneling
According to the RPR protocol, frames destined for or sourced from a MAC
address not on the RPR ring are flooded. All the stations on the ring therefore
can receive the traffic, causing bandwidth waste, affecting the fairness algorithm,
and tending to compromise data security.
The RPR implementation on the S9500 supports Layer 2 tunneling, allowing
traffic to be tunneled per VLAN. Ethernet frames in a VLAN can be transparently
transmitted between stations, and the stations do not need to learn MAC
addresses in the VLAN. Layer 2 tunneling is thus suitable for point-to-point
interconnection on the ring.
The Layer 2 bridging function is primarily provided on GE RPR rings while VLAN-
based tunneling is primarily adopted on 10-Gbps or 2.5-Gbps RPR rings. On a
10-Gbps or 2.5-Gbps RPR ring, you can achieve Layer 2 bridging by combining
VPLS with RPR. On a GE RPR ring, a maximum of 128-K MAC addresses are
supported.
4.5 Compatibility with Ethernet Protection Mechanisms
The RPR interfaces on the S9500 supports the Spanning Tree Protocol (STP).
They can participate in STP calculation like common Ethernet ports to eliminate
loops. With STP, RPR can remove loops at Layer 2 (if any) on intersecting rings
and ring links, delivering high reliability.
RPR can work with VRRP to achieve backup between stations on an RPR ring,
thus improving reliability. On the S9500, a special procedure is designed to
ensure that the switchover between the master and the backup in a VRRP group
on an RPR ring is within 50 milliseconds. Such node protection can well offset
the impact of extremities such as power failure of RPR boards.
4.6 Complete Clock Schemes
On a 10-Gbps RPR ring, as 10 GE LAN mode is supported, the RPR stations do
not need to synchronize the clocks.
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 20/21
In 10-Gbps POS mode, RPR resolves the clock synchronization loss between
stations by having hardware automatically insert idle frames.
In addition, the 9500 provides a special RPR clock switch mechanism. With this
mechanism, a master station uses a clock card to keep synchronization with the
reference clock source. The clocks on all other stations on the RPR ring keep
synchronization with the clock of the master station and automatically switch to
adapt to topology change. Thus, on an RPR ring, the clocks on the side towards
the master station are locked to a common source.
4.7 Ease of Configuration
On an RPR station on the S9500, traffic transmission is done through a pair of
RPR physical ports. To simplify configuration, an RPR logical interface is used in
equivalence to the pair of RPR physical ports. This logical interface can be
configured like a common Ethernet port. You can have the service run simply by
configuring some VLAN settings on the interface. The two RPR physical ports
are transparent to the service layer. All service-related configurations are done
on the RPR logical interface rather than on the two physical ports respectively.
4.8 RPR Implementation on the S9500
Figure 11 RPR implementation on the S9500
On the S9500, RPR is implemented through an RPR board, which can be GE,
2.5 Gbps, or 10 Gbps. The RPR board exchanges traffic with other boards
S9500 RPR Technology White Paper (V2.00)
Copyright © 2007 Hangzhou H3C Technologies Co., Ltd. Page 21/21
through a cross bar. A 10-Gbps RPR board supports 10-Gbps traffic insert/copy
at wire speed and copy of bursty traffic exceeding 10 Gbps. The services
between RPR rings are switched between RPR boards through the cross bar.
Copyright ©2007 Hangzhou H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of
Hangzhou H3C Technologies Co., Ltd.
The information in this document is subject to change without notice.
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 1/12
H3C S9500 Active SRPU and Standby SRPU
Switchover Technology White Paper
Keywords: Active and standby SRPU switchover, HA, High Availability
Abstract: HA, High Availability, is an indispensable feature of carrier-class devices. Active and
standby SRPU switchover is one of the most important implementations of the HA feature.
This manual introduces the active and standby SRPU switchover implementation on the
S9500 series switches.
Acronyms:
Acronym Full spelling
HA High Availability
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 2/12
Table of Contents
1 Overview .................................................................................................................................. 3
2 Active and Standby SRPU Switchover Mechanism................................................................. 3
2.1 Active and Standby SRPU Switchover Mechanism Overview ...................................... 3
2.1.1 Introduction to the Switchover Process .............................................................. 3
2.1.2 Introduction to the State Machine ....................................................................... 4
2.2 Active and Standby Switchover Triggering Mechanism ................................................ 7
2.2.1 Active and Standby Status Determination........................................................... 7
2.2.2 Active and Standby Switchover Triggering Mechanism...................................... 8
2.3 Registration Mechanism................................................................................................ 9
3 Active and Standby Performance ............................................................................................ 9
3.1 Configuration Layer Active and Standby Performance ................................................. 9
3.2 Protocol Layer Active and Standby Performance ......................................................... 9
3.2.1 Introduction to Graceful Restart ........................................................................ 10
3.2.2 Layer 2 Unicast Forwarding .............................................................................. 11
3.2.3 Layer 2 Multicast Forwarding............................................................................ 11
3.2.4 Layer 3 Unicast Forwarding .............................................................................. 11
3.2.5 Layer 3 Multicast Forwarding............................................................................ 12
3.2.6 MPLS/VPN ........................................................................................................ 12
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 3/12
1 Overview High Availability (HA), is an indispensable feature of carrier-class devices. HA feature
can be used to achieve a higher degree of system availability. When finding system
faults, HA can quickly and correctly recover the normal running of the system, thus
shortening the Mean Time to Repair (MTTR) of the system.
The S9500 series switches support the HA feature, for example, the active and
standby SRPU switchover is one of the implementations of HA. Devices supporting
HA are generally equipped with two SRPUs, with one being the active SRPU and the
other being the standby SRPU. The active SRPU communicates with the external
network to implement normal functions of each module in the system; while the
standby SRPU works as a backup of the active SRPU, and does not communicate
with the external network. When the active SRPU works abnormally, the system
performs switchover automatically, and the standby SRPU takes the responsibility of
the active SRPU to ensure normal services.
Refer to H3C S9500 HA Technology White Paper for the introduction to the HA
feature of the S9500 series switches.
2 Active and Standby SRPU Switchover Mechanism
2.1 Active and Standby SRPU Switchover Mechanism Overview
2.1.1 Introduction to the Switchover Process
The switchover process of the active and standby SRPUs involves three phases:
back up data in batches, back up data in real time and synchronize data.
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 4/12
l After the standby SRPU is started, the active SRPU will synchronize the backup
data of all modules to the standby SRPU. This process is called back up data in
batches.
l After backup in batches is finished, the system begins the real-time backup
process, during which the backup data is backed up to the standby SRPU when
the backup data of the active SRPU changes.
l After the switchover between the active and standby SRPUs, the standby
SRPU becomes the new active SRPU, and tells each module to collect and
synchronize data from the service board. This process is called data
synchronization. During the data synchronization process, each module
communicates with the service board to confirm and synchronize hardware
status, link layer status and configuration data to ensure the consistency of the
data and status maintained by the system, thus ensuring that the system can
work normally after switchover of the active and standby SRPUs. Only after the
data synchronization is completed can the standby SRPU become the active
SRPU completely.
2.1.2 Introduction to the State Machine
The active SRPU changes its state in turn as the following: Wait for standby SRPU
insertion, Wait for backup request, Back up data in batches, Back up data in real time
and Synchronize data.
The standby SRPU changes its state from Be ready, Receive data in batches, to
Receive data in real time in turn.
The Synchronize data state stays between the active state and standby state. In this
state, the standby SRPU has become the active SRPU in its hardware state, but it
needs to collect and synchronize data from the service board, and therefore, the
standby SRPU has not become the active SRPU completely. The system prompts the
following:
System is busy with warm backup, please wait ...
The standby SRPU can completely become the active SRPU only after data
synchronization is completed.
The following figure shows the state change of the active and standby SRPUs:
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 5/12
Active SRPU
Waiting for standby insertion
Active SRPU initialization
To be Active SRPUSend in position message to active SRPU
Standby SRPU
Receive in position message
Wait for backup in batches request
Receive backup in batches request
User sends backup dataActive
SRPU module
Back up data in batches
Backup completed
Back up data in real time
User sends realtime backup data
Ready?
Standby SRPU initialization
Send batch backup request
Receiving realtime backup data
Receive the backup in batches completed message
Receiving realtime backup data
Standby SRPU became active SRPU
Synchronize data
Send batch backup data
Send realtime backup data
Synchronize with line card
Standby SRPU Module
Figure 1 State machine of active and standby SRPU switchover
After being started, the active SRPU is in the Wait for standby SRPU insertion state.
In case of a single SRPU, the state machine of the active and standby SRPUs is
always in this state; while the standby SRPU is in the Be ready state after being
started. After sending the In position message to the active SRPU, the standby SRPU
will wait for the state change timer to time out. See Figure 2 for this process.
Upon receiving the In position message from the standby SRPU, the active SRPU
changes its state to Wait for backup request.
After the state change timer times out, the standby SRPU sends a Back up data in
batches request to the active SRPU and changes its state to Receive data in batches
state; upon receiving the Back up data in batches request from the standby SRPU,
the active SRPU changes its state to the Back up data in batches state, and begins to
collect data of each module, and synchronizes the data to the standby SRPU.
After the backup on each module on the active SRPU is completed, the active SRPU
will send a Backup completed message to the standby SRPU, and change its state to
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 6/12
Back up data in real time; upon receiving the Backup completed message, the
standby SRPU changes its state to Receive data in real time. When entering the real-
time backup state, the active and standby SRPUs enter a relatively stable state, as
shown in Figure 2 .
Standby insertion
Insertion ACK
Standby request
Request ACK
Back up request
Back up data in batches
Batch data backup
Backup in batches completed
Standby SRPU sent Backup in batches completed message
Backup of all modules completed
Active
Backup completedBatch data backup
Backup completedBatch data backup
Backup completed
Process repeats on every Module
Module HA HA Module
Standby
Figure 2 Active and standby switchover
When detecting that the SRPU is not in the position or is resetting, the standby SRPU
becomes the active SRPU. However, at this time, the software on the standby SRPU
has not met the requirements for becoming the active SRPU. Therefore, the standby
SRPU first enters the Synchronize data state and during this synchronization process,
the new active SRPU collects and synchronizes data from the service board. After the
data synchronization is completed, the new active SRPU becomes the active SRPU
completely, and the state machine enters the Wait for standby insertion state; while
the original active SRPU reboots and becomes the standby SRPU, and then enters
the Be ready state.
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 7/12
Caution:
The standby SRPU can only change from the Receive data in real time state to the Synchronize data state. Before entering the Receive data in real time state, the standby SRPU will reset itself upon detecting that the active SRPU is not in the position as it has not collected all the information of the system.
2.2 Active and Standby Switchover Triggering Mechanism
2.2.1 Active and Standby Status Determination
In case of double SRPUs, whether an SRPU is the active SRPU or the standby
SRPU is determined by the hardware during device startup. Generally, the device
selects the SRPU with a smaller slot number as the active SRPU (in case of double
SRPUs, the hardware will set a delay time on the SRPU with a bigger slot number to
make it startup later).
Upon initial startup, the two SRPUs are in the standby state, and perform software
startup respectively. The SRPU with a smaller slot number sets its state to normal in
a period after startup, and detects the state of the other SRPU; while the SRPU with a
bigger slot number checks whether the other SRPU is normal after 15 seconds delay
and sets its state to normal. When the state of the SRPU with a smaller slot number
becomes normal, the state of the SRPU with a bigger slot number is still abnormal,
and therefore the state of the SRPU with a smaller slot number is in the active state;
while the SRPU with a bigger slot number will set its state to standby after detecting
that the other SRPU is normal after 15 seconds delay. Therefore, in case of double
SRPUs, even if the SRPU with a bigger slot number is the active SRPU before
system reboot, the SRPU with a smaller slot number is the active SRPU after system
reboot.
Caution:
In case to double SRPUs, you need to ensure the consistency of the software and hardware versions of the active and standby SRPUs.
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 8/12
2.2.2 Active and Standby Switchover Triggering Mechanism
After entering the Receive data in real time state, if detecting the switchover
notification, the standby SRPU becomes the active SRPU. The detection notification
is triggered due to hardware interrupt, and the hardware switchover time of the active
SRPU and standby SRPU is in milliseconds. After hardware switchover, the new
active SRPU enters the Synchronize data state.
Active and standby switchover is triggered due to following reasons:
l The command line performs the active and standby switchover command.
l The active SRPU works abnormally.
l The active SRPU resets or is plugged out.
l Abnormal software reboot happens on the active SRPU. For example, the
hardware watchdog reboots because a module occupies CPU for a too long
time; or reboot caused by data access abnormality and command access
abnormality.
Meanwhile, after entering the Back up data in real time state, the active and standby
SRPUs will periodically send handshake packets.
Figure 3 Active and standby handshake process
The timers send handshake packets to the peer at an interval of one second.
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 9/12
If the standby SRPU does not receive handshake packets from the active SRPU in
120 seconds, it considers that the connection with the active SRPU fails and resets
itself.
2.3 Registration Mechanism
The software modules take corresponding actions during the state machine change.
They notify the system of their actions through the registration mechanism. When the
state changes, the system takes corresponding actions on the modules in turn based
on their priorities.
The switchover speed of the active and standby state machine depends on the
processing time of each module.
For example, if the configuration file is too large, the configuration module will take a
relatively long time to backup information in batches, resulting in the increase of the
time for backup information in batches.
The data synchronization duration is longer when the device is fully configured than
when it is not as the device management module will collect the board information
and port link status information.
3 Active and Standby Performance
3.1 Configuration Layer Active and Standby Performance
With the batch backup and real-time backup process, the configuration information on
the active SRPU has been backed up to the standby SRPU in time. Therefore, for the
configuration layer, data can be smoothly synchronized during active and standby
SRPU switchover.
3.2 Protocol Layer Active and Standby Performance
During the data synchronization process, the forwarding tables on the service board
are not re-learned after being deleted to ensure non-stopping forwarding of services.
During the process for the SRPU to collect and synchronize data, the original data on
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 10/12
the SRPU keeps unchanged, and only the changed data is updated.
3.2.1 Introduction to Graceful Restart
The control software and forwarding software of an S9500 switch are separated from
each other. The control plan controls and manages the whole device, discovering
routes and delivering routes to the interface boards. The forwarding plane is
dedicated to data forwarding. With respective processor, these two planes are
functionally independent. The software on the active SRPU is a control software,
which processes the configuration information of users and runs different protocols,
for example, runs the routing protocols such as OSPF, ISIS and BGP to discover
routes and apply these routes to each interface board. The software on each
interface board is a forwarding software, which maintains its forwarding table
according to the notifications of the active SRPU and forwards data based on the
forwarding table.
With the above distributed structure adopted, when the control software restarts
(because of hardware or software faults) or is reloaded (software upgrade), the
forwarding services are not interrupted. Control software restart or reloading does not
affect the normal running of the forwarding software. Therefore, as long as the
network topology keeps stable during control software restart or reload, data
forwarding of the router that is rebooting is feasible and reliable.
The current problem is that each time the control software restarts or it is reloaded, all
the routing protocols have to restart, the neighbor relationships between the device
and the adjacent devices have to be rebuilt, and all the routing information databases
have to be re-synchronized. Neighbor relationship interruption triggers route
recalculation on the adjacent devices, causing route oscillation and forwarding
interruption on the network. To solve this problem, IETF proposes a series of
enhanced protocols for different routing protocols, such as IS-IS, OSPF, BGP, and
LDP respectively. With these enhanced protocols, the original protocol flows are
improved. When the control plane restarts on the device, the device will notify its
neighbors to temporarily preserve the routing information and adjacency relationship
with the device. After the protocol restarts, the neighbors will help the restarting
device to update routing information and to restore it to the state prior to the restart in
minimal time. No route flapping occurs during the restart, the packet forwarding path
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 11/12
remains the same, and the whole system can forward data continuously. Hence, it is
called “Graceful Restart”, and sometimes called “non-stop forwarding (NSF)”.
3.2.2 Layer 2 Unicast Forwarding
In Layer 2 unicast forwarding, the MAC address table concerned exists on the service
board. When active and standby SRPU switchover is started, the new active SRPU
collects and synchronizes MAC address information from the service board, while the
original MAC address table on the service board is not deleted during this process,
and thus normal Layer 2 unicast data forwarding can be ensured on the service board.
Caution:
For cross-board or cross-chip Layer 2 forwarding, temporary packet loss may occur during the data synchronization process as the routing table entries of the standby SRPU have not been correctly updated.
3.2.3 Layer 2 Multicast Forwarding
For Layer 2 multicast forwarding, the multicast MAC address entries needed for
forwarding are saved on the service board. When active and standby SRPU
switchover is performed, as the original multicast MAC address entries on the service
board keep unchanged when the SRPUs collect data from the interface board, the
Layer 2 multicast streams can be successfully forwarded.
Caution:
During the data synchronization process, packet loss may occur because the multicast protocol will update the entries.
3.2.4 Layer 3 Unicast Forwarding
For Layer 3 unicast forwarding, data forwarding is implemented by ARP and FIB.
During the data synchronization process of the active and standby SRPUs, the
S9500 Active SRPU and Standby SRPU Switchover Technology White Paper
Hangzhou H3C Technologies Co., Ltd. 12/12
original ARP and FIB keep unchanged; meanwhile, because the routing protocols
such as OSPF, BGP and ISIS support the GR function and the adjacency neighbors
keep their neighboring relationship and routing table unchanged, the forwarding path
of the IP packets in a network keep unchanged, thus ensuring the ongoing of Layer 3
unicast forwarding services.
3.2.5 Layer 3 Multicast Forwarding
For Layer 3 multicast forwarding, the original multicast service can be successfully
forwarded during the data synchronization process as the multicast entries that have
been established on the interface board keep unchanged.
Caution:
During the data synchronization process, packet loss may occur because the multicast protocol will update the entries.
3.2.6 MPLS/VPN
For the MPLS/VPN services, the original MPLS/VPN service streams can be
successfully forwarded as the table entries of the original MPLS/VPN services on the
interface board keep unchanged during data synchronization.
Copyright ©2007 Hangzhou H3C Technologies Co., Ltd. All rights reserved.
No part of this manual may be reproduced or transmitted in any form or by any means without prior written consent of
Hangzhou H3C Technologies Co., Ltd.
The information in this document is subject to change without notice.