dissertation - University of Virginia School of ...mv/PhDthesis/haobo/dissertation.pdf ·...
Transcript of dissertation - University of Virginia School of ...mv/PhDthesis/haobo/dissertation.pdf ·...
Hardware-Accelerated Signaling: Design,
Implementation and Implications
DISSERTATION
for the Degree of
DOCTOR OF PHILOSOPHY
(Electrical Engineering)
HAOBO WANG
November 2004
Hardware-Accelerated Signaling: Design,
Implementation and Implications
DISSERTATION
Submitted in Partial Fulfillment
of the REQUIREMENTS for the
Degree of
DOCTOR OF PHILOSOPHY (Electrical Engineering)
at the
POLYTECHNIC UNIVERSITY
by
Haobo Wang
November 2004
Approved:
_____________________________
Department Head
Copy No. ___ _________, 20__________Date
Approved by the Guidance Committee:
Major: Electrical Engineering
______________________Malathi VeeraraghavanAssociate Professor ofElectrical Engineering______________________Date
______________________Ramesh KarriAssociate Professor ofElectrical Engineering______________________Date
______________________Shivendra PanwarProfessor ofElectrical Engineering______________________Date
Minor: Computer Science
______________________Haldun Hadimioglu Associate Professor of Computer Science ______________________Date
Microfilm or other copies of this dissertation are obtainable from
UMI Dissertations Publishing
Bell & Howell Information and Learning
300 North Zeeb Road
P. O. Box 1346
Ann Arbor, Michigan 48106-1346
VITAMr. Haobo Wang received his B.Sc. and M.Sc. degrees in Electrical Engineering from
the University of Electronic Science and Technology of China, Chengdu, China, in 1997
and 2000, respectively. From April 2000 to August 2000, he worked for Lucent Technolo-
gies, China, as a hardware (ASIC) design engineer.
Since September 2000, Mr. Wang has conducted research work towards his Ph.D.
degree in Electrical Engineering at Polytechnic University, where he has been a research
fellow of the Center of Advanced Technology in Telecommunications (CATT) at Poly-
technic University. The work presented in this dissertation was undertaken between Janu-
ary 2001 and November 2004 under the supervision of Professor Malathi Veeraraghavan
and Professor Ramesh Karri.
Mr. Wang is a student member of the Institute of Electrical and Electronics Engineers
(IEEE), a student member of the IEEE Communications Society (ComSoc), and a student
member of the International Society for Optical Engineering (SPIE).
i
ACKNOWLEDGMENTSI would like to express my deepest gratitude to the numerous people who have been
instrumental in the completion of this dissertation. First and foremost, I want to thank my
advisors Professor Malathi Veeraraghavan and Professor Ramesh Karri for their guidance
and encouragement throughout the course of my Ph.D. program.
Professor Veeraraghavan tirelessly guided me in every step and aspect of my research
work, from the selection of my research topic to the final preparation of my dissertation
defense. Apart from the immense assistance towards my research work, she helped me
shape my personality as a researcher by helping me with the most important fundamental
aspects, from better English writing skills to performing complex theoretical analyses. She
has been the single biggest motivating factor during these important years of my life. I
specially like to thank Professor Karri for having devoted his time and energy towards my
everyday research work during the last two years.
I would also like to thank the other members of my dissertation guidance committee,
Professor Shivendra S. Panwar and Professor Haldun Hadimioglu, for their thoughtful and
insightful advice. Professor Panwar gave me valuable comments on the applications of
hardware-accelerated signaling, which greatly enriched my dissertation. Professor Hadi-
mioglu took the time to proof-read the dissertation and help iron out many mistakes, both
technical and grammatical.
Many thanks go to my colleagues in the Wireless Networks lab and the CAD Research
lab. It has always been enjoyable to discuss technical and non-technical issues with them.
I especially thank Tao Li, Dr. Xuan Zheng, and Zhifeng Tao with whom I spent my first
two years at Poly. I also thank Dr. Kaijie Wu, Bo Yang, Dr. Piyush Mishra, Nikhil Joshi,
ii
and Tongquan Wei with whom I have had a great time during the past two years. These
wonderful times with them have resulted in great friendship both at a technical and per-
sonal level. The journey through these past four years of graduate life at Poly has been one
of the best experiences of my life.
I owe a huge debt of gratitude to my family: my parents and my sister Minmei, for
their dedicated and endless love and support not only throughout the course of my Ph.D.
studies, but my entire life. Without them, I could not have achieved anything that I have
achieved today.
Finally I thank National Science Foundation (NSF) and the Center of Advanced Tech-
nology in Telecommunications (CATT) at Polytechnic University for providing me with
the required financial support for my Ph.D. research.
iii
Abbreviations and AcronymsAPI Application Programming Interface
APS Automatic Protection Switch
ASIC Application-Specific Integrated Circuit
ATM Asynchronous Transfer Mode
BGP Border Gateway Protocol
BNF Backus-Naur Form
CAC Connection Admission Control
CAM Content Addressable Memory
CCAMP WG Common Control and Measurement Plane Working Group
CR-LDP Constraint-based Routed Label Distribution Protocol
DCC Data Communication Channel
DDR Dual Data Rate
DSP Digital Signal Processing
DWDM Dense Wavelength Division Multiplexing
FIFO First-In First-Out
FPGA Field Programmable Gate Array
FTP File Transfer Protocol
GbE Gigabit Ethernet
GMPLS Generalized Multi-Protocol Label Switching
GPU Graphics Processing Unit
HBA Host Bus Adapter
IANA Internet Assigned Numbers Authority
IETF Internet Engineering Task Force
IntServ Integrated Services
IP Internet Protocol
LDCC Line Data Communication Channel
LDP Label Distribution Protocol
LMP Link Management Protocol
LOL Loss of Light
LOS Loss of Signal
LPM Longest Prefix Match
LSB Least Significant Bit
LSP Label Switched Path
LSR Label Switching Router
iv
LVDS Low Voltage Differential Signaling
MAC Media Access Control
MPLS Multi-Protocol Label Switching
MSB Most Significant Bit
MTP Message Transfer Part
NIC Network Interface Card
NSE Network Search Engine
OC Optical Carrier
OCSP Optical Circuit-switched Signaling Protocol
OIF Optical Internetworking Forum
OSPF-TE Open Shortest Path First-Traffic Engineering
OTN Optical Transport Network
PCB Printed Circuit Board
PCI Peripheral Component Interface
PCS Physical Coding Sublayer
PNNI Private Network to Network Interface
QoS Quality of Service
RAM Random Access Memory
RFC Request for Comment
RSVP Resource ReSerVation Protocol
RSVP-TE Resource ReSerVation Protocol-Traffic Engineering
RTT Round-Trip Time
SAN Storage Area Network
SD Signal Degrade
SDCC Section Data Communication Channel
SDH Synchronous Digital Hierarchy
SerDes Serializer/Deserializer
SONET Synchronous Optical NETwork
SRAM Static Random Access Memory
SS7 Signaling System 7
STS Synchronous Transport Signal
TCAM Ternary Content Addressable Memory
TCP Transport Control Protocol
TLV Type-Length-Field
TOE TCP Offload Engine
TTL Time-to-Live
v
UDP User Datagram Protocol
UNI User Network Interface
VHDL VHSIC Hardware Description Language
VHSIC Very High Speed Integrated Circuits
VLSI Very Large Scale Integration
WDM Wavelength Division Multiplexing
vi
AN ABSTRACTHardware-Accelerated Signaling: Design, Implementation and Implications
by
Haobo Wang
Advisor: Malathi Veeraraghavan
Submitted in Partial Fulfillment of the Requirements for the
Degree of Doctor of Philosophy (Electrical Engineering)
November 2004
Despite the dominance of connectionless IP networks, i.e., the Internet, connection-
oriented networks are gaining attention because of their inherent support for Quality of
Service (QoS). Even IP routers are being enhanced with connection-oriented features. Sig-
naling protocols are used in connection-oriented networks to set up and tear down connec-
tions.
Signaling protocols are primarily implemented in software for two reasons, complex-
ity and the requirement for flexibility. Although these are two good reasons for software
implementations, the price paid is in performance. Software implementations of signaling
protocols are rarely capable of handling over calls/sec. Correspondingly, per-switch
call processing delays are in the order of milliseconds.
We propose to implement signaling protocols in hardware, expecting a 2-3 orders of
magnitude improvement in the call handling capacities of switches. As a first step, we
define Optical Circuit-switched Signaling Protocol (OCSP), a performance-oriented sig-
1000
vii
naling protocol designed for SONET switches. We implement OCSP on WILDFORCE
FPGA evaluation board. The simulation results show a call handling rate of
calls/sec and a per-switch call processing delay of .
We then choose RSVP-TE for hardware acceleration. As a signaling protocol targeting
almost all connection-oriented networks, RSVP-TE for GMPLS is complex and flexible,
and not intended for hardware implementation. However, it is not only impractical but
unnecessary to implement the complete signaling protocol in hardware. Instead, we
extract a subset of RSVP-TE for hardware-acceleration and relegate the functionality
beyond this subset to software. The subset is large enough to cover time-critical operations
while small enough to make hardware implementation feasible. We implement the subset
on a Xilinx Virtex-II FPGA device, which we call hardware signaling accelerator. With a
fully pipelined architecture and innovative design techniques, a call handling rate of
calls/sec is achieved, and per-switch call processing delay is .
In order to demonstrate a complete switching system equipped with the hardware sig-
naling accelerator, we design a PCI-based prototype board including both user-plane and
control-plane devices. The user-plane carries the user traffic while the control-plane con-
trols the operation of the user-plane. The hardware signaling accelerator is the core com-
ponent on the control-plane.
The total call setup delay consists of three components: round-trip propagation delay,
call processing delays, and signaling message emission delays. The round-trip propaga-
tion delay is limited by the speed-of-light. With hardware-accelerated signaling, the call
processing delays can be reduced to the order of microseconds. The last component is
determined by the choice of signaling transport options: in-band signaling or out-of-band
150 000,
6.6µs
250 000, 7.2µs
viii
signaling. We compare the total call setup delay for both options under different scenarios.
The impact of hardware-accelerated signaling is quite far-reaching. It will fundamen-
tally change our current view of connection-oriented networks. With the per-switch call
processing delay decreasing to the order of microseconds, and the call handling rate
increasing to the order of calls/sec, connections can be established and released much
faster, switches can handle more requests and maintain more simultaneous connections.
New architectures, and applications that can better exploit the hardware-accelerated sig-
naling are proposed and analyzed.
105
ix
ContentsAcknowledgment................................................................................................................. iAbbreviations and Acronyms ............................................................................................ iiiAbstract.............................................................................................................................. vi
1 Introduction 1
1.1 Background ........................................................................................................ 11.2 Problem statement ............................................................................................. 61.3 Related work ...................................................................................................... 71.4 Outline ............................................................................................................... 9
2 Optical Circuit-switched Signaling Protocol (OCSP): Specification and Imple-mentation 11
2.1 OCSP - a signaling protocol for performance ................................................. 112.1.1 Signaling messages ............................................................................... 122.1.2 State transition diagram of a connection ............................................... 132.1.3 Data tables ............................................................................................. 14
2.2 Implementation of OCSP on WILDFORCE platform .................................... 152.2.1 Parameters ............................................................................................. 162.2.2 State transition diagram of the hardware signaling accelerator ............ 172.2.3 Managing the available time-slots in hardware .................................... 182.2.4 Managing the connection references in hardware ................................ 192.2.5 Implementation and simulation results ................................................. 20
3 A Subset of RSVP-TE for Hardware-Acceleration 23
3.1 Review of RSVP-TE for GMPLS networks .................................................... 243.2 A Subset of RSVP-TE for GMPLS networks ................................................. 25
3.2.1 RSVP messages .................................................................................... 253.2.1.1 RSVP common header ............................................................. 253.2.1.2 RSVP messages ....................................................................... 26
3.2.2 RSVP objects ........................................................................................ 293.2.2.1 FILTER_SPEC/SENDER_TEMPLATE object ...................... 303.2.2.2 FLOWSPEC/SENDER_TSPEC object ................................... 303.2.2.3 Generalized LABEL object ..................................................... 313.2.2.4 Generalized LABEL_REQUEST object ................................. 323.2.2.5 RSVP_HOP object .................................................................. 333.2.2.6 SESSION object ...................................................................... 34
3.2.3 Optional objects support ....................................................................... 343.2.3.1 SUGGESTED_LABEL object ................................................ 353.2.3.2 MESSAGE_ID/MESSAGE_ID_ACK object ......................... 36
3.2.4 Connection setup and teardown with RSVP-TE signaling messages ... 37
x
4 RSVP-TE Procedures for Hardware-Acceleration 38
4.1 Registers .......................................................................................................... 384.2 Data tables ....................................................................................................... 40
4.2.1 Routing table ......................................................................................... 414.2.2 Incoming Connectivity table ................................................................. 414.2.3 Outgoing Connectivity table ................................................................. 424.2.4 Outgoing CAC table ............................................................................. 434.2.5 User/Control Mapping table ................................................................. 444.2.6 State table .............................................................................................. 45
4.3 Procedures ........................................................................................................ 45
5 An FPGA-Based Hardware Signaling Accelerator 50
5.1 The architecture of the hardware signaling accelerator ................................... 515.2 Functional modules .......................................................................................... 51
5.2.1 Register bank ........................................................................................ 515.2.2 Input message buffering system ........................................................... 535.2.3 Two-level object dispatching ................................................................ 555.2.4 Data table management ......................................................................... 565.2.5 Resource (bandwidth) management ...................................................... 585.2.6 Re-transmission management ............................................................... 60
5.3 Implementation and simulation results ............................................................ 635.4 Re-configurability ............................................................................................ 65
6 Design of a Prototype Switching System 67
6.1 Architecture of the PCI-Based prototype switching system ............................ 686.2 Main on-board devices .................................................................................... 70
6.2.1 1 Gbps signaling channel ...................................................................... 706.2.2 Incoming message buffer - FIFO device .............................................. 716.2.3 Data tables - the TCAM and SRAM devices ........................................ 74
6.2.3.1 The TCAM and SRAM devices .............................................. 746.2.3.2 The organization of the data tables .......................................... 78
6.2.4 User-plane device - VSC9182 .............................................................. 816.3 Clock distribution networks ............................................................................. 836.4 Power distribution scheme ............................................................................... 86
7 Design and Analysis of Signaling Networks 87
7.1 Introduction to in-band signaling and out-of-band signaling .......................... 887.2 Delay models ................................................................................................... 89
7.2.1 Assumptions .......................................................................................... 907.2.2 Notation ................................................................................................ 917.2.3 Queueing model .................................................................................... 91
xi
7.2.3.1 Model without retransmissions ................................................ 927.2.3.2 Model including re-transmissions ........................................... 94
7.3 Numerical results and implications ................................................................. 987.3.1 Input parameter values .......................................................................... 987.3.2 Hardware-accelerated signaling engine, and ................ 1007.3.3 Hardware-accelerated signaling engine, and ................... 1027.3.4 Software signaling processor .............................................................. 103
7.4 Some comments ............................................................................................. 104
8 Applications of Hardware-Accelerated Signaling 106
8.1 Utilization analysis of circuit-switched networks for file transfers ............... 1068.1.1 Problem statement ............................................................................... 1068.1.2 Utilization analysis and the numerical results .................................... 109
8.2 Hardware-accelerated signaling and network survivability .......................... 1138.2.1 Utilization and delay analysis for path restoration ............................. 115
9 Conclusions and Future Work 119
µtx µproc=µtx µproc«
xii
List of FiguresFigure 1. Illustration of connection setup. ....................................................................... 3Figure 2. Unfolded view of a switch in connection-oriented networks. .......................... 4Figure 3. GMPLS protocol stack diagram. ...................................................................... 4Figure 4. An example of connection-oriented networks. ................................................. 5Figure 5. OCSP signaling messages. ............................................................................. 12Figure 6. State transition diagram of a connection. ....................................................... 13Figure 7. Data tables used by the signaling protocol. .................................................... 14Figure 8. Architecture of WILDFORCETM board. ...................................................... 15Figure 9. State transition diagram of the hardware signaling accelerator. ..................... 17Figure 10. Time-slot manager. ......................................................................................... 18Figure 11. Connection reference manager. ...................................................................... 19Figure 12. Timing simulation for Setup message. ........................................................... 21Figure 13. RSVP message common header. .................................................................... 25Figure 14. RSVP object. .................................................................................................. 29Figure 15. TLV structure of the SESSION object. .......................................................... 30Figure 16. FILTER_SPEC/SENDER_TEMPLATE object. ............................................ 30Figure 17. FLOWSPEC/SENDER_TSPEC object. ......................................................... 31Figure 18. LABEL object. ............................................................................................... 32Figure 19. Format of a SONET/SDH label. ..................................................................... 32Figure 20. SONET multiplexing hierarchy and generalized label. .................................. 32Figure 21. LABEL_REQUEST object. ........................................................................... 33Figure 22. RSVP_HOP object. ........................................................................................ 33Figure 23. SESSION object. ............................................................................................ 34Figure 24. MESSAGE_ID/MESSAGE_ID_ACK object. ............................................... 36Figure 25. Illustration of connection setup with RSVP-TE messages. ............................ 37Figure 26. Network used for illustrative examples. ......................................................... 40Figure 27. Example of Avail_BW. .................................................................................. 44Figure 28. Processing of Common header. ...................................................................... 46Figure 29. Processing of Path message. ........................................................................... 46Figure 30. Processing of SESSION object. ..................................................................... 48Figure 31. At the end of Path message processing. ......................................................... 49Figure 32. Architecture of the hardware signaling accelerator. ....................................... 51Figure 33. Dynamic round-robin style pipelining. ........................................................... 52Figure 34. Two-level of input message buffering. ........................................................... 53Figure 35. Buffer management module. .......................................................................... 54Figure 36. TLV style object processing. .......................................................................... 55Figure 37. Dependencies among the data tables. ............................................................. 57Figure 38. Priority arbitrator for TCAM. ......................................................................... 58
xiii
Figure 39. Resource management module. ...................................................................... 59Figure 40. Comparison of resource allocation schemes. ................................................. 60Figure 41. Re-transmission management (buffers and timers). ....................................... 61Figure 42. Processing of Path message. ........................................................................... 64Figure 43. Processing of Resv message. .......................................................................... 64Figure 44. Processing of PathTear message. ................................................................... 65Figure 45. Architecture of the prototype board. .............................................................. 68Figure 46. 1Gbps signaling channel. ................................................................................ 70Figure 47. Block diagram of L8104. ................................................................................ 70Figure 48. Block diagram of IDT72V36100. ................................................................... 72Figure 49. RAM and CAM. ............................................................................................. 74Figure 50. How does TCAM work. ................................................................................. 76Figure 51. FPGA/TCAM/SRAM configuration. ............................................................. 76Figure 52. TCAM command bus. .................................................................................... 77Figure 53. 72-Bit look-up timing diagram. ...................................................................... 78Figure 54. Configurations of the five data tables in the TCAM device. .......................... 79Figure 55. Use matched address as interface ID. ............................................................. 80Figure 56. Organization of the data tables. ...................................................................... 81Figure 57. Block diagram of VSC9182. .......................................................................... 82Figure 58. Address bus and data bus. .............................................................................. 82Figure 59. Timing diagram to program the switch fabric. ............................................... 83Figure 60. Clock distribution scheme. ............................................................................. 84Figure 61. In-band and out-of-band signaling options. .................................................... 88Figure 62. Three cases of in-band signaling. ................................................................... 88Figure 63. Queueing models of the signaling protocol processor and the signaling channel
transmitters. .................................................................................................... 92Figure 64. Signaling channel transmitter model including re-transmissions. .................. 95Figure 65. In-band/out-of-band signaling with hardware signaling, metro area,
. ................................................................................................ 100Figure 66. In-band/out-of-band signaling with hardware signaling, wide area,
. ................................................................................................ 101Figure 67. In-band/out-of-band signaling with hardware signaling, metro area,
. .................................................................................................. 102Figure 68. In-band/out-of-band signaling with hardware signaling, wide area,
. ................................................................................................ 103Figure 69. In-band/out-of-band signaling with software signaling, metro area. ........... 104Figure 70. In-band/out-of-band signaling with software signaling, wide area. ............. 104Figure 71. Aggregate network utilization vs. offered traffic load. ................................ 108Figure 72. Crossover file size vs. percentage of the offered load. ................................. 108Figure 73. A model for a circuit-switched node operated in call-blocking mode. ........ 109
µtx µproc=
µtx µproc=
µtx µ« proc
µtx µproc«
xiv
Figure 74. Crossover file size vs. total utilization (hardware signaling). ...................... 111Figure 75. Crossover file size vs. total utilization (software signaling). ....................... 111Figure 76. Pure software signaling needs more client-side interfaces to create the aggregate
load to achieve the desired utilization .......................................................... 112Figure 77. A sample network with a single link failure. ................................................ 115
xv
List of TablesTable 1. Clock cycles consumed by various OCSP messages. ..................................... 22Table 2. Data registers. .................................................................................................. 38Table 3. Configuration registers. ................................................................................... 40Table 4. Routing table. .................................................................................................. 41Table 5. Incoming Connectivity table. .......................................................................... 41Table 6. Incoming Connectivity table at switch II in Fig. 26........................................ 42Table 7. Outgoing Connectivity table............................................................................ 42Table 8. Outgoing Connectivity table for switch I in Fig. 26........................................ 43Table 9. Outgoing CAC table. ....................................................................................... 43Table 10. Outgoing CAC table for switch I in Fig. 26. ................................................... 43Table 11. User/Control Mapping table. ........................................................................... 44Table 12. State table. ....................................................................................................... 45Table 13. Implementation results. ................................................................................... 63Table 14. Clock Cycles to receive an RSVP-TE message. ............................................. 63Table 15. L8104 system interface.................................................................................... 71Table 16. Main interface signals of IDT72V36100......................................................... 72Table 17. Typical TCAM and SRAM signals. ................................................................ 78Table 18. Signals used to program the switch fabric....................................................... 83Table 19. Devices and related clock signals.................................................................... 84Table 20. Poser dissipations of main devices. ................................................................. 86Table 21. Notation. .......................................................................................................... 91Table 22. Input parameter values. ................................................................................... 98Table 23. Notation. ........................................................................................................ 109Table 24. Number of circuits needed for a given utilization. ........................................ 112Table 25. Notation. ........................................................................................................ 114
1
Chapter 1 Introduction
1.1 Background
Since the premiere of ARPAnet, the precursor to today’s Internet, in 1969, the reach of
connectionless IP networks has been growing. Nowadays the Internet is so successful that
more than million users around the world use the Internet regularly [1]. The Internet
Assigned Numbers Authority (IANA) has allocated more than billion IP addresses to
countries and regions [2], which account for more than half of total IPv4
addresses. The history of connection-oriented networks dates back even further to 1879,
when the then Bell Telephone Company set up the first telephone network in the world.
Now different types of connection-oriented networks, such as Asynchronous Transfer
Mode (ATM) networks, Synchronous Optical NETworks (SONET), Synchronous Digital
Hierarchy (SDH) networks, and Dense Wavelength Division Multiplexed (DWDM) net-
works, are deployed widely. However, these connection-oriented networks are primarily
used as carriers for IP traffic. For example, Abilene [3], the backbone of Internet2® [4],
leases OC-192c/OC-48c SONET circuits from Qwest Communications to inter-connect
backbone routers. Internet2 provides high-speed IP service to end users while the aggre-
gate user traffic is carried on the SONET circuits.
Two issues typically associated with connection-oriented networks are as follows.
First, in connection-oriented networks, a connection must be established before data trans-
fer can take place. This process is called connection (call) setup, which involves the
exchange of signaling messages between end hosts and the associated per-hop resource
reservation. The connection (call) setup overhead becomes significant if the holding time
of the connection is short, which is typical for today’s end user traffic. Second, once a con-
800
2
229 232
2
nection is established, the end-to-end connection is dedicated to the two end systems on
that connection. There is no resource-sharing, which may lead to poor utilization if the
user traffic is bursty.
However, connection-oriented networks have been gaining more attention recently
because of their inherent support for Quality of Service (QoS). Standardization organiza-
tions, such as Optical Internetworking Forum (OIF) and Internet Engineering Task Force
(IETF) Common Control and Measurement Plane (CCAMP) working group are creating
protocols, standards for connection-oriented networks, both packet-switched and circuit-
switched. New network architectures and applications are proposed to better utilize con-
nection-oriented networks. For example, in [5], the authors note that file transfers have no
intrinsic burstiness, and hence identify file transfer as an application that can fully utilize
connections.
Signaling protocols are used in connection-oriented networks to set up and tear down
connections. Examples of signaling protocols include the Signaling System 7 (SS7) in
telephone networks [6], the User Network Interface (UNI) [7] and Private Network to Net-
work Interface (PNNI) [8] signaling protocols in ATM networks, Label Distribution Pro-
tocol (LDP) [9], Constraint-based Routed LDP (CR-LDP) [10] and Resource ReSerVation
Protocol - Traffic Engineering (RSVP-TE) [11] in Multi-Protocol Label Switched (MPLS)
networks, and the extensions of these protocols for Generalized MPLS (GMPLS) [12]-
[15], which includes SONET/SDH and DWDM networks.
Setting up a connection at a switch involves five steps:
1. Determining the next-hop switch toward which the connection should be routed. This
task typically requires a data table look-up. Routes to destinations are pre-computed
3
using data gathered by routing protocols and stored in data tables.
2. Checking the availability of and reserving required resources (link capacity and
optionally buffer space). This step is generally called connection admission control.
3. Assigning “labels” for the connection. The exact form of the “label” depends on the
type of connection-oriented network. For example, in SONET/SDH switches, a label
identifies time-slots on the incoming and outgoing switch interfaces.
4. Programming the switch fabric to map incoming labels to outgoing labels.
5. Updating the state information associated with the connection.
In a typical connection setup procedure as illustrated in Fig. 1, a Setup message1
requesting the setup of a connection progresses from the calling end device (source)
toward the called end device (destination) hop-by-hop, and a Setup Success message trav-
els in the reverse direction, again hop-by-hop. In this scenario, the first three steps should
be performed in the forward direction so that resources are reserved as the Setup proceeds,
while the remaining steps could be performed as signaling message proceeds in the for-
ward direction or in the reverse direction. Other variants of this procedure are possible
such as reverse direction resource reservation [16].
1 Here we use a generic name for the message, i.e., Setup. Different signaling protocols call this message by differentnames, e.g., Setup in ATM UNI, Label Request in CR-LDP, or Path in RSVP-TE.
SwitchSW4
SwitchSW5
SwitchSW1 Switch
SW2
SwitchSW3
Source Destination1. Setup
2. Setup 3. Setup4. Setup
5.Setup Success6. Setup Success7. Setup Success
8. Setup Success
9. Connection (circuit orvirtual circuit) established
Figure 1. Illustration of connection setup.
4
Upon completion of data transfer, the connection is released with a similar end-to-end
release procedure. Typically release messages are also confirmed. Switches processing the
release messages free up bandwidth, optionally buffer, and label resources for usage by
the next connection.
To support the above-described connection setup and release procedures, signaling
messages with parameters in each message, some mandatory and some optional, are
defined in a typical signaling protocol. In addition, other messages to support notifica-
tions, keep-alive exchanges, etc., are also present in signaling protocols.
With regards to implementation, we illustrate a typical architecture of a switch
(unfolded view) in connection-oriented networks in Fig. 2. The user-plane hardware con-
sists of a switch fabric and line cards that terminate input/output interfaces carrying user
traffic. In packet switches, the line cards perform network-layer protocol processing to
determine how to forward packets. In circuit switches, the line cards are typically multi-
plexers/demultiplexers.
Figure 2. Unfolded view of a switch in connection-oriented networks.
Linecard
Linecard
Linecard
Linecard
SwitchFabric
User plane
InputInterfaces
OutputInterfaces
HardwareSignaling
Accelerator NIC
NIC
NIC
NIC
Control plane
Input signalingInterfaces
Output signalingInterfaces
uPRouting Signaling
processprocessLMP
process
IP
LMPRSVP-TE
UDP
CR-LDP
TCP
BGPOSPF-TE
Figure 3. GMPLS protocol stack diagram.
5
The control-plane unit typically consists of three modules, routing, link management,
and signaling. For example, the IETF CCAMP working group is defining the protocols for
GMPLS regarding all three aspects, such as Open Shortest Path First - Traffic Engineering
(OSPF-TE) [17] for routing, Link Management Protocol (LMP) [18] for link manage-
ment, and RSVP-TE/CR-LDP for signaling, as shown in Fig. 3. These control-plane mod-
ules are typically implemented as software processes residing on the microprocessor,
although in Fig. 2, we show a hardware signaling accelerator to speed up the processing of
signaling messages. Network Interface Cards (NICs) are shown on the control-plane.
These cards are used to process the lower layers of the signaling protocols on which the
signaling messages are carried. For example, in SS7 networks, the NICs process the Mes-
sage Transfer Part (MTP) layers, which are the lower layers of the SS7 protocol stack. In
optical networks, the expectation is that an out-of-band IP network will be used to carry
signaling messages between switches. In this case, the NICs may be Ethernet cards. It is
also possible to carry the signaling messages on the same interface as the user data. For
example, the Data Communication Channel (DCC) in each SONET/SDH signal is often
used for signaling. It is referred to as “in-band signaling.” Fig. 4 shows an example of con-
nection-oriented network, where user-plane and control-plane are located in different net-
works (“out-of-band signaling”).
Figure 4. An example of connection-oriented networks.
Control planeUser plane
6
1.2 Problem statement
Signaling protocols are primarily implemented in software for two important reasons.
First, signaling protocols are quite complex with many messages, parameters and proce-
dures, especially the signaling protocols for GMPLS, which target almost all connection-
oriented networks. Second, signaling protocols are updated frequently requiring a certain
amount of flexibility for upgrading field implementations. For example, RSVP-TE with
extensions for GMPLS evolved from RSVP-TE for MPLS, which, in turn, evolved from
RSVP for IP networks. Request for Comments (RFCs) and Internet drafts are being con-
tinuously proposed and updated to enhance RSVP-TE. Hence an implementation of
RSVP-TE must be capable of frequent upgrades. While these are two good reasons for
implementing signaling protocols in software, the price paid is in performance. Signaling
protocol implementations in software are rarely capable of handling over 1000 calls/sec.
Correspondingly, call setup delays per-switch are in the order of milliseconds [19].
Connection-oriented networks have been gaining more attention because of their
inherent QoS support. Even IP routers are being enhanced with connection-oriented fea-
tures. For example, MPLS is being added to IP networks along with RSVP-TE to establish
Label Switched Path (LSP) tunnels. These trends require a break-through in call handling
capacities. The problem statement of this research is to determine whether signaling proto-
cols can be implemented in hardware, and if so, to demonstrate it with an actual imple-
mentation [20]. In addition, we explore the impact of hardware-accelerated signaling
protocol implementations and study the related question of how to reduce signaling mes-
sage transmission delays.
Implementing signaling protocols in hardware is a great challenge considering the
7
complexity and the requirement for flexibility of such protocols. The complexity problem
can be partly overcome by defining a subset of signaling protocols for hardware accelera-
tion. This subset should cover most time-critical operations. Nevertheless, innovative
hardware implementation techniques are required. Towards solving the problem of flexi-
bility, we propose to use re-configurable hardware, i.e., Field Programmable Gate Arrays
(FPGAs) as the hardware platform for implementation. FPGAs can be re-configured as
signaling protocols evolve.
Our implementation is targeted for transit switches in SONET networks, and it sup-
ports point-to-point unidirectional connections.
1.3 Related work
There are many signaling protocol standards as listed in Section 1.1. In addition, many
other signaling protocols have also been proposed in the literature [21]-[26]. Some of
these protocols such as fast reservation schemes [21][22], YESSIR [23], PCC [24] have
been designed to achieve low call setup delays by improving the signaling protocols them-
selves. Fast Reservation Protocol (FRP) [25] is the only signaling protocol that has been
implemented in Application Specific Integrated Circuit (ASIC) hardware. Such an ASIC
implementation is inflexible because upgrading the signaling protocol implementation
entails a complete re-design of the ASIC. Recently, a simple signaling protocol called
“JumpStart”, which was designed for hardware implementation, was proposed in [26] for
burst-switched networks.
Because of the popularity of RSVP-TE for MPLS networks and the desire for scalable
QoS support in large IP networks, some researchers have also explored hardware-acceler-
ated implementation of the RSVP-TE signaling protocol. In [27], the “Keep-It-Simple”
8
Signaling (KISS) protocol, a simplified version of RSVP-TE for hardware-acceleration,
was proposed. They implemented the KISS protocol in software for debugging and proto-
col development. They also discussed possible approaches for hardware implementation
and estimated the achievable performance, but no hardware implementation was carried
out. The KISS protocol is not compatible with RSVP-TE and any other available signaling
protocols, it cannot inter-operate with any deployed switches or MPLS-enabled IP routers.
The KISS protocol is targeted for MPLS-enabled IP networks, and will not work with
other connection-oriented network technologies.
Other comparable protocols implemented in hardware include TCP. In [28], Benz pro-
posed to implement the “normal” TCP functionalities in hardware and handle complex
functionalities such as congestion control, error control, in software. He implemented his
approach on the Myrinet® platform. A similar concept, TCP Offload Engine (TOE) [29],
is gaining some popularity in today’s market. These solutions off-load part of the TCP
functionalities from the CPU to a co-processor located at a NIC or a Host Bus Adapter
(HBA, the NIC equivalent in Storage Area Networks).
Molinero-Fernandez and Mckeown [30] proposed to implement a technique called
TCP switching, in which the TCP SYNchronize segment is used to trigger connection
setup and TCP FINish segment is used to trigger connection release. By processing these
segments inside switches, the TCP SYN/FIN procedures become comparable to a signal-
ing protocol for connection setup/release. The authors planned to implement this tech-
nique in FPGAs.
9
1.4 Outline
This dissertation is organized as follows.
In Chapter 2, we discuss our work on the design and implementation of a new perfor-
mance-oriented signaling protocol called Optical Circuit-switched Signaling Protocol
(OCSP) [31], which we define for SONET networks. While in typical signaling protocol
definitions, flexibility is the primary concern, our primary goal in designing this protocol
is to achieve high-performance implementation.
Concurrent to our definition and implementation of OCSP, the optical networking
industry defined new signaling protocols for SONET/SDH/DWDM networks. Of these,
RSVP-TE for GMPLS is the one that has gained popularity, and is now widely imple-
mented by many vendors. This makes it important to demonstrate our concept of hardware
accelerated signaling with an implementation of RSVP-TE.
As a generalized signaling protocol targeting almost all connection-oriented networks,
RSVP-TE for GMPLS is quite complex and not intended for hardware implementation. To
solve this challenge we define a subset of RSVP-TE [32] for hardware acceleration and
relegate the remaining functionality to software. The former should be large enough to
handle most signaling messages and yet small enough to make hardware implementation
feasible. Chapter 3 describes this work.
In Chapter 4, we describe the procedures needed to implement this subset of RSVP-
TE [33]. Registers and data tables used in the hardware implementation are defined.
Based on the subset of RSVP-TE defined in Chapters 3 and 4, Chapter 5 describes the
architecture of hardware implementation. It is implemented on a Xilinx Virtex-II FPGA,
which we call the “hardware signaling accelerator [34].” We explain the main functional
10
modules in detail, present the implementation and simulation results. We also address the
issue of re-configurability.
In order to demonstrate a real switching system, in which a switch fabric can work
under the control of the hardware signaling accelerator described in Chapter 5, we build a
PCI-based prototype board. We present this work in Chapter 6. The board-level architec-
ture is given. Main on-board devices are discussed in detail. We also explain the clock and
power distribution schemes.
In Chapter 7 we discuss a related issue on how to reduce signaling message transmis-
sion delays. We compare the call setup delays of two options: in-band signaling and out-
of-band signaling under different scenarios. This issue is important because the selection
of signaling transport mechanisms affects the total call setup delay. Gains in reducing of
call setup delay through hardware-accelerated signaling must be through sound design of
the signaling transport mechanism.
Our work on hardware-accelerated implementation of signaling protocols has a far-
reaching impact on connection-oriented networks. With decreased call setup delay and
increased call handling rate, connections can be established and released much faster, and
switches can handle more requests and maintain more simultaneous connections. In
Chapter 8, we model and analyze the impact of hardware-accelerated signaling on current
connection-oriented networks, and propose applications that can fully exploit the benefit
of hardware-accelerated signaling.
11
Chapter 2 Optical Circuit-switched Signaling Protocol
(OCSP): Specification and Implementation
In Chapter 1, we mentioned many signaling protocols. Those protocols are complex
and flexible, and not intended for hardware implementation. In order to demonstrate the
feasibility and advantage of hardware-accelerated signaling, we design and implement a
signaling protocol called OCSP, which is specifically engineered for SONET networks.
While in typical signaling protocol definitions, flexibility is a primary concern, our pri-
mary goal in designing this protocol is to achieve high-performance implementations.
We introduce OCSP in Section 2.1. It is not a complete signaling protocol specifica-
tion. Our approach is to define a subset large enough that a significant percentage of user
requirements can be handled with this subset. Infrequent operations are delegated to the
software signaling process. Therefore, often in this description, we will leave out details
that are handled by the software. In Section 2.2, we demonstrate the hardware implemen-
tation of OCSP. The hardware platform, the design, and the implementation and simula-
tion results are discussed.
2.1 OCSP - a signaling protocol for performance
In this section, we first introduce four OCSP signaling messages used to set up (Setup
and Setup-Success), and tear down (Release and Release-Confirm) connections. We then
give the state transition diagram of a connection at each switch. Finally we discuss the
data tables we introduce to facilitate the hardware implementation of the OCSP signaling
protocol.
12
2.1.1 Signaling messages
We define four signaling messages, Setup, Setup-Success, Release, and Release-Con-
firm. Fig. 5 illustrates the detailed fields of these four messages.
The Message Length field specifies the length of a message. The Time-to-Live (TTL)
field has the same meaning as in IP header. The Message Type field distinguishes four dif-
ferent messages. The Connection Reference identifies a connection locally. The Source IP
Address, Destination IP Address, and Previous Node IP Address specify the end hosts and
the previous node respectively. The Bandwidth field specifies the bandwidth requirement
of a connection. The Interface/time-slot pairs are used to program the switch fabric. If
there are an odd number of interface/time-slot pairs, a 16-bit pad is inserted because the
message is 32-bit aligned. The Checksum field covers the whole message.
In Setup-Success message, the Bandwidth field records the allocated bandwidth. In
Release and Release-Confirm messages, the Cause field explains the reason of release.
The Message Length, Message Type and Connection Reference fields are common to all
messages and occupy the same relative position. Such an arrangement simplifies hardware
design.
Figure 5. OCSP signaling messages.
Setup Message
Message Length TTL Msg.Typ (0001) Connection Ref. (prev.)Destination IP Address
Source IP AddressPrevious Node IP Address
Bandwidth Reserved Interface Number Timeslot NumberPad Bits Checksum
Setup-SuccessMessage
Message Length Bandwidth Msg.Typ (0010) Connection Ref. (prev.)Connection Ref.(own) Reserved Checksum
Release/Release-Confirm
Message
Message Length Cause Msg.Typ(0011/0100) Connection Ref. (prev.)
Connection Ref.(own) Reserved Checksum
31 16 15 0
13
2.1.2 State transition diagram of a connection
Each connection passes through a certain sequence of states at each switch. In our pro-
tocol, we define four states: Setup-Sent, Established, Release-Sent and Closed. Fig. 6
shows the state transition diagram. Initially, the connection is in the Closed state. When a
switch receives a Setup message, if resources are available, the switch accepts the request,
reserves the resources, sends the message to the next switch on the path, and marks the
state of the connection as Setup-Sent. When a switch receives a Setup-Success message,
which means all switches along the path have successfully established the connection, the
state of the connection changes to Established. Release-Sent means the switch has
received a Release message, freed the allocated resources, and sent the message to the pre-
vious node. After the switch receives a Release-Confirm message, the connection is suc-
cessfully terminated, the state of the connection returns to Closed.
Figure 6. State transition diagram of a connection.
Closed(0001)
SetupSent
(0010)
ReleaseSent
(0100)
Setup MessageReceived
Release or ReleaseConfirm Received
Setup SuccessReceived
Release MessageReceived
Release ConfirmReceived
Established(0011)
14
2.1.3 Data tables
There are five data tables associated with the signaling protocol, namely, Routing
table, Connection Admission Control (CAC) table, Connectivity table, State table, and
Switch-Mapping table, as shown in Fig. 7. The Routing table is used to determine the next-
hop switch. The index is the destination address; the fields include the address of the next
switch and the corresponding output interface. The CAC table maintains the available
bandwidth on the interfaces leading to neighboring switches. The Connectivity table is
used to map the interface numbers used at neighboring switches to local interface num-
bers. This information will be used to program the switch fabric.
The State table maintains the state information associated with each connection. The
connection reference is the index into the table. The fields include the connection refer-
ences and addresses of the previous and next switches, the bandwidth allocated for the
connection, and most importantly, the state information as defined in Fig. 6.
Switch fabrics, such as PMC-Sierra® PM5372, Agere® TDCS6440G and Vitesse®
VSC9182, have similar programming interfaces. For example, VSC9182 has an 11-bit
address bus A[10:0] and a 10-bit data bus D[9:0]. The switch is programmed by present-
Figure 7. Data tables used by the signaling protocol.
Index Return valueDestination address Next node address Next node interface#
Index Return/Written valueNext node address Total bandwidth Available bandwidth
Index Return valueNeighbor address Neighbor interface# Own interface#
Index Return/Written valueOwn connection
referenceConnection reference
State BandwidthNode address
Previous Next Previous Next
Index Return/Written value
Own connectionreference
Sequentialoffset(0 to
BW-1)
Incoming Ch. ID Outgoing Ch. ID
Interface# Timeslot# Interface# Timeslot#
Routingtable
CACtable
Conn.table
Statetable
SwitchMappingtable
15
ing the output interface/time-slot pair on A[10:0] and the input interface/time-slot pair on
D[9:0]. We define a generic Switch-Mapping table to emulate this programming interface,
with the connection reference as the index and the incoming interface/time-slot pair and
the outgoing interface/time-slot pair as the fields.
2.2 Implementation of OCSP on WILDFORCE platform
To demonstrate the feasibility and advantage of hardware accelerated signaling, we
implement OCSP on FPGA. We use the WILDFORCETM re-configurable computing
board shown in Fig. 8. It consists of five XC4000 series Xilinx FPGAs: one XC4036
(CPE0) and four XC4013 (PEn). These five FPGAs can be used for user logic while the
cross-bar provides programmable interconnections between the FPGAs. In addition, there
are three First-In First-Out (FIFO) devices on the board, and one dual-port Random
Access Memory (RAM) device attached to CPE0. The board is connected to the host sys-
tem through the PCI bus. The board supports a C language based Application Program-
ming Interface (API) through which the host system can dynamically configure the
FPGAs and access the on-board FIFOs and RAMs.
Figure 8. Architecture of WILDFORCETM board.
PCIBus
PCIchip
FIFO 0 FIFO 1 FIFO 4SRAM
Cross-Bar
Local Bus
Local Bus
Host 32
32
Mezzaninememory
CPE 0
PE 2 PE 3 PE 4PE 1
36
3636Signaling
accelerator
16
For our prototype implementation, we use CPE0, PE1, FIFO0, FIFO1 and dual-port
RAM. The CPE0 implements the hardware signaling accelerator state machine, State and
Switch-Mapping tables, FIFO0 controller, and dual-port RAM controller. The dual-port
RAM implements the Routing, CAC and Connectivity tables. FIFO0 and FIFO1 work as
receive and transmit buffers for signaling messages. PE1 implements the FIFO1 controller
and provides the data path between CPE0 and FIFO1.
In the following subsections, we discuss some parameters selected for our hardware
implementation, and the state transition diagram of the hardware signaling accelerator. We
also present two novel approaches for managing resources such as time-slots and connec-
tion references.
2.2.1 Parameters
Our implementation supports 5-bit addresses for all nodes, 16 interfaces per switch, a
maximum bandwidth of 16 STS-1s per interface, and a maximum of 32 simultaneous con-
nections at each switch. We are primarily limited by the size of the on-board dual-port
RAM on our off-the-shelf prototype board, and hence are forced to use these small values
for address sizes, etc. The FPGA implementation itself does not limit these parameters.
Hence, we can easily increase the values of these parameters when designing a customized
board.
Recently, there has been significant progress in fast table look-up in both research lit-
erature [35][36] and commercial products. Look-up/classification co-processors are
widely available, such as Silicon Access Networks® iAP, PMC-Sierra® ClassiPI, Soli-
dum® PAX1200, etc. These chips can easily process look-ups/sec. In our prototype
implementation, we assume that routing table look-ups can be off-loaded to an external
100M
17
co-processor, and use an equivalent four memory access duration to simulate a routing
table look-up.
2.2.2 State transition diagram of the hardware signaling accelerator
Fig. 9 shows the detailed state transition diagram of the hardware signaling accelera-
tor. When a signaling message arrives, it is temporarily buffered in FIFO0. The hardware
signaling accelerator then reads the message from FIFO0 and delimits the message
according to the Message Length field. The Checksum field is verified. The State table is
consulted to check the current state of the connection. Based on the Message Type field,
the hardware signaling accelerator processes the message accordingly. The processing of
the Setup message involves checking the TTL field, reading the Routing table to determine
the next switch and corresponding output interface, updating the CAC table, reading the
Connectivity table to determine the input interface, allocating a connection reference to
identify the connection, allocating time-slots and programming the Switch-Mapping table.
The Setup-Success message requires no special processing. The processing of the Release
Figure 9. State transition diagram of the hardware signaling accelerator.
reset ready receive
parse
rd route
rd CAC
rd conn
allc cref
allc ts
wr switch
wr state
rd state
rd CAC
Release
wr CAC
rd switch
free ts
free cref
ready
bw>1
transmit
statemismatch
checksumfailed
Release-Confirm
bw>1
chk TTL Setuperror
error no route
TTL expired
errorerror
error
other
Setup-Success
bandwidthunavailable
invalidparameter
wr CAC
18
message involves updating the CAC table, releasing the time-slots reserved for the con-
nection. When processing the Release-Confirm message, the allocated connection refer-
ence is freed and thus, the connection is terminated. After processing any message, the
State table is updated. The new message is generated and buffered in FIFO1 temporarily,
and then transmitted to the next switch along the path.
2.2.3 Managing the available time-slots in hardware
The management of time-slots and connection references is easy in software through
simple array manipulations. However, this poses a challenge in hardware implementa-
tions. Our solution is to use a priority decoder.
Fig. 10 illustrates our implementation of a time-slot manager. Each entry in the time-
slot table is a bit-vector, corresponding to an output interface with the bit-position deter-
mining the time-slot number and the bit-value determining availability of the time-slot
(‘0’ available, ‘1’ used). The priority decoder is used to select the first available time-slot.
When an interface number is provided by the signaling state machine to the time-slot man-
ager, the bit-vector corresponding to the interface is sent into the priority decoder and the
first available time-slot is returned. Then the bit corresponding to the time-slot is marked
Figure 10. Time-slot manager.
1 0 0 0 1 1 0 0
...
...
0 0 1 1interface#
1 0 0 0 1 1 0 0...1 1 0 0 1 1 0 0...
scratch register
... ...
012312131415
mark used
write back
12
5
6
0123
12131415
PriorityDecoder
1 1 1 0output timeslot
34
...
19
as used (from ‘0’ to ‘1’) and the updated bit-vector is written back to the table. In the
example shown in Fig. 10, the time-slot manager is required to find a free time-slot on
interface 3. It returns time-slot 14 and marks it as “used.” De-allocating a time-slot fol-
lows a similar pattern but the time-slot number is needed as an input in addition to the
interface number in order for the time-slot manager to free the time-slot.
2.2.4 Managing the connection references in hardware
A connection reference is used to identify a connection locally. It is allocated when
establishing a connection and de-allocated when terminating it. A straightforward imple-
mentation of a connection reference manager is a bit-vector combined with a priority
decoder. The priority decoder finds the first available bit-position (a bit marked as ‘0’),
sends its index as the connection reference and updates the bit as used (a bit marked as
‘1’). However, this approach is impractical when there are a large number of connections.
While our actual implementation only used connections per switch, we designed the
connection reference manager to handle simultaneous connections, which requires a
bit-vector with entries. This is too large for the simple priority decoder implementa-
Figure 11. Connection reference manager.
0 0 0 0 0 0 0 0
Table Pointer1 1 1 0 0 1 0 1
...
...
PriorityDecoder
Pointer Adjust
1 1 1 0 0 1 0 1...
Scratch Register
0 0 0 0 0 0 0 0 1 1 0 0
Output Connection Reference
1 1 1 1 0 1 0 1...
... ...012312131415
...
Mark Used
2
3
4
8
5
6
7
Write back
1 0123
252253254255
32
212
4096
20
tion as used for time-slots.
Our improvement to this basic approach is to use a table with 256 entries of 16-bit vec-
tors to record the availability of a total of connection references. Fig. 11 illustrates
this approach. With connections, we need a 12-bit connection reference. The first 8
bits of the connection reference correspond to the table pointer, while the remaining 4 bits
correspond to the first available connection reference from among the 16 pointed to by the
table pointer. The connection reference manager starts with the table pointer set to 0. If
any of the 16 connection references corresponding to this row of the table are available
(i.e., a bit position is ‘0’), the priority decoder will identify this index and write the output
connection reference as a concatenation of the 8-bit table pointer and the 4-bit index
extracted. In the example shown in Fig. 11, the 12th bit in the first row is a ‘0’. Therefore it
outputs the connection reference number 12. The bit-position is marked as used as illus-
trated with steps 5 - 7 of Fig. 11.
De-allocating follows a similar approach; the bit corresponding to the connection ref-
erence is reset to ‘0’ and the updated bit-vector is written back to the table. We can paral-
lelize this approach by partitioning the table into several smaller tables, each with a
pointer and a priority decoder, forming several smaller managers. All these managers
work concurrently. A round-robin style counter can be used to choose a connection refer-
ence among the managers. Thus, this approach can be generalized if more than con-
nections are to be handled.
2.2.5 Implementation and simulation results
We develop a prototype VHDL model for the hardware signaling accelerator, use Syn-
plify tool for synthesizing the design and Xilinx Alliance tool for the placement and rout-
4096
4096
4096
21
ing of the design. CPE0 (XC4036) uses of its resources while PE1 (XC4013) uses
of its resources.
We perform timing simulations of the hardware signaling accelerator using ModelSim
simulator. The simulation results for the Setup message are shown in Fig. 12. It can be
seen that while receiving and transmitting a Setup message (requesting a bandwidth of
STS-12 at a cross-connect rate of STS-1) consumes clock cycles each, processing of a
Setup message consumes clock cycles. Overall, this translates into clock cycles to
receive, process and transmit a Setup message. We estimate a maximum of clock
cycles if multiple (four) routing table look-ups are needed.
Processing Setup-Success, Release and Release-Confirm messages consumes about
clock cycles total since these messages are much shorter (2 32-bit words versus 11 32-
bit words for Setup) and require simpler processing. A detailed breakdown of the clock
cycles consumed to process each of these signaling messages is shown in Table 1. Assum-
ing a clock, this translates into to microseconds for Setup message pro-
cessing and about microseconds for the combined processing of Setup-Success,
Release and Release-Confirm messages. Thus, a complete setup and teardown of a con-
nection consumes about microseconds. Accordingly, the achievable call handling rate
62%
8%
12
53 77
101
Figure 12. Timing simulation for Setup message.
Setup messagereceive
Setup messageprocess
Setup messagetransmit
70
25MHz 3.1 4.0
2.8
6.6
22
is as high as calls/sec. It is a speed-up compare to software-
based implementations of signaling protocols.
Table 1. Clock cycles consumed by various OCSP messages.
Setup Setup Success Release Release
ConfirmClock cycles
77-101 9 51 10
150 000, 100X 1000X–
23
Chapter 3 A Subset of RSVP-TE for Hardware-
Acceleration
The hardware implementation of OCSP demonstrated a call handling rate of
calls/sec, which is a speedup vis-à-vis its software counterpart. However,
OCSP, designed by us as a proof-of-idea, is specifically engineered for performance and
limited to SONET networks. It is important to demonstrate our concept of hardware-accel-
erated signaling with an implementation of a widely used signaling protocol.
Among the signaling protocols we mentioned in Chapter 1, RSVP-TE for GMPLS
networks attracts our attention. It is one of the most widely implemented signaling proto-
cols. As one of the two signaling protocols recommended for GMPLS (the other is CR-
LDP), RSVP-TE can work with different types of connection-oriented networks, such as
SONET/SDH networks, DWDM networks. Most telecom equipment vendors support
RSVP-TE in their switches.
Implementing RSVP-TE for GMPLS in hardware is a challenge because of its com-
plexity. Our solution is to only implement the time-critical functions in hardware and rele-
gate the non-time-critical functions to software. For this purpose, it is essential to extract a
subset of RSVP-TE for hardware-acceleration. In Section 3.1, we briefly review the his-
tory of RSVP-TE for GMPLS. We then detail the subset we extract in Section 3.2. This
subset is targeted for implementation at a transit SONET switch and it only supports point-
to-point unidirectional connections.
150 000,
100X 1000X–
24
3.1 Review of RSVP-TE for GMPLS networks
The original RSVP was initially defined as a resource reservation setup protocol for
IntServ [37]. RSVP can be used by a host to request specific QoS from the network. It can
also be used by routers to deliver QoS requests to all nodes along the path, and to establish
and maintain state to provide the requested service.
RSVP was designed to work in a multicast scenario. In order to efficiently accommo-
date large groups, dynamic group membership, and heterogeneous receiver requirements,
RSVP makes receivers responsible for requesting a specific QoS. RSVP requests
resources for simplex flows, i.e., in one direction only. In order to reserve resources in
both directions, two sets of RSVP message exchanging will be incurred.
When the MPLS community was looking for a signaling protocol for MPLS, they first
came up with LDP and CR-LDP, which are specifically designed for MPLS. At the same
time, an enhanced version of RSVP with traffic engineering extensions (RSVP-TE) [11]
was introduced as an alternative for MPLS signaling. Both RSVP-TE and CR-LDP are
similar while RSVP-TE is more popular in the industry.
RSVP-TE was first designed for packet-switched MPLS networks. When MPLS was
generalized for circuit-switched networks, such as SONET/SDH networks, DWDM net-
works, RSVP-TE was also extended to support these network technologies [15]. Besides,
technology-specific extensions were defined to address technology-specific issues. For
example, [38][39] defined the traffic parameters and labels for SONET/SDH networks,
and G.709 Optical Transport Networks (OTNs), respectively.
RSVP supports the notion of soft-state and periodic refreshing. If a refresh is not
received before the time-out interval expires, connections are released. Since packet for-
25
warding is based on the IP routing data table, as routing data changes, the resource reser-
vations need to follow. Hence RSVP included the use of refresh message. A second reason
for refresh messages is that since RSVP uses unreliable IP service, the occasional loss of
an RSVP message is handled through refreshes. However, the first reason does not hold
for GMPLS networks, where once a circuit is established, the routing data table is not con-
sulted for data forwarding. And refreshing is not a good solution to the reliability issue: if
the refresh interval is small, the overhead spent in processing refresh messages can
become excessive; while if the refresh interval is large, it takes longer to detect the loss of
an RSVP message. Therefore RFC2961 [40] introduced a TCP-like mechanism to ensure
reliable message transport.
The RSVP-TE and related protocols are revised often and enhanced to accommodate
new applications while maintaining backward compatibility. The subset we describe in the
next section is mainly based on the original RSVP [16], the RSVP-TE for MPLS networks
[11], the RSVP-TE extensions for GMPLS networks [15], the SONET/SDH extensions
for GMPLS networks [38], and the RSVP refresh overhead reduction extensions [40].
3.2 A Subset of RSVP-TE for GMPLS networks
3.2.1 RSVP messages
All RSVP messages begin with a common header, followed by a body consisting of a
variable number of variable-length, typed “objects”.
3.2.1.1 RSVP common header
Fig. 13 shows the construct of RSVP common header. The 4-bit Vers field specifies
Vers Flags Msg Type RSVP Checksum
Send_TTL (Reserved) RSVP Length
Figure 13. RSVP message common header.
26
the protocol version number. Current RSVP (including RSVP-TE and its extensions) is
version 1. RSVP reserves 4 bits for Flags. No flag bits are defined in [11][15][16], in [40],
bit 0 was introduced as Refresh-Reduction-Capable bit, value ‘1’ indicates the support for
the refresh overhead reduction extensions. The RSVP_Checksum field calculates the one’s
complement of the one’s complement sum of the whole message. This field helps to deter-
mine possible message error. The RSVP_Length field indicates the total length of the
RSVP message in bytes.
The Msg_Type field specifies the type of the message. There are 7 messages defined in
traditional RSVP, Path, Resv, PathErr, ResvErr, PathTear, ResvTear, and ResvConf.
RSVP-TE added Hello message for the purpose of node failure detection. When RSVP-
TE was enhanced for GMPLS, Notify message was introduced to support fast failure noti-
fication.
RSVP messages are carried in IP packet directly. The corresponding protocol number
in IP header is 43. There is a TTL field in IP header. RSVP common header also has a
Send_TTL field, which records the TTL value of the transmitting node. This field can be
used to indicate the existence of non-RSVP capable node. If the received TTL in the IP
header is different from the transmission TTL stored in the common header, there must be
non-RSVP nodes exist between two neighboring RSVP-capable nodes.
3.2.1.2 RSVP messages
Totally nine RSVP messages are defined in [11][15][16]. Among these messages, Path
and Resv messages are used to set up an LSP, PathTear/ResvTear is used to tear down an
LSP. These four messages should be processed by hardware. ResvConf message, used to
confirm Resv message, is triggered by an optional object, RESV_CONFIRM, in Resv mes-
27
sage. We assume normally there is no ResvConf message in our application. All other mes-
sages, PathErr, ResvErr, Hello, and Notify, are non-time-critical. These four messages,
together with ResvConf, should be processed by software.
This section details the format of the four RSVP messages to be processed by hard-
ware. Messages are defined in Backus-Naur Form (BNF)1 format. We omit all optional
objects in the messages. The order of the objects in a message is recommended, but not
mandatory, meaning an implementation should create messages with the objects in the
order shown below, but accept the objects in any permissible order.
1. Path message
A Path message is used by an upstream Label Switching Router (LSR) to ask a down-
stream LSR to allocate a label for an LSP. More generally, it is a request to set up an LSP.
There are 6 objects mandatory in Path Message, SESSION, RSVP_HOP, TIME_VALUES,
LABEL_REQUEST, SENDER_TEMPLATE, and SENDER_TSPEC.
<Path Message>::= <Common Header>
<SESSION>
<RSVP_HOP>
<TIME_VALUES>
<LABEL_REQUEST>
<sender descriptor>
<sender descriptor>::= <SENDER_TEMPLATE>
<SENDER_TSPEC>
1 Backus-Naur Form, introduced by John Backus and improved by Peter Naur, is one of the most commonly used meta-syntactic notation for specifying the syntax of programming languages, command sets, and the like. The meta-symbolsof BNF are:::= meaning “is defined as”| meaning “or”<> angle brackets used to surround item names[] square brackets used to surround optional itemsThe BNF implies an order for the items.
28
2. Resv Message
A Resv message is used by a downstream LSR to advertise the binding of a label to a
specific LSP. After a label is allocated for an LSP, a Resv message, which carries the label
assigned to the LSP, is sent to the upstream LSR. There are 7 objects mandatory in Resv
message, SESSION, RSVP_HOP, TIME_VALUES, STYLE, FLOWSPEC, FILTER_SPEC,
and LABEL.
<Resv Message>::= <Common Header>
<SESSION>
<RSVP_HOP>
<TIME_VALUES>
<STYLE>
<flow descriptor list>
<flow descriptor list>::= <FLOWSPEC>
<FILTER_SPEC>
<LABEL>3. PathTear and ResvTear Messages
In RSVP-TE, either source or destination can tear down an LSP. Source node tears
down an LSP by sending out a PathTear message, while the destination node can do the
same by sending out a ResvTear message.
<PathTear Message>::= <Common Header>
<SESSION>
<RSVP_HOP>
<sender descriptor>
<sender descriptor>::= <SENDER_TEMPLATE>
<ResvTear Message> :: =<Common Header>
<SESSION>
29
<RSVP_HOP>
<STYLE>
<flow descriptor list>
<flow descriptor list>::= <FLOWSPEC>
<FILTER_SPEC>
3.2.2 RSVP objects
Objects following the common header are the real carriers of message fields. Each
object, a Type-Length-Value (TLV) triplet, is a self-contained element, consisting of one
or more 32-bit words with a one-word header. Fig. 14 shows the construct of an RSVP
object.
As the name suggests, a TLV-structured object consists of a Type field, a Length field,
and a variable length Value field. For RSVP, the Type field can be further split into Class-
Num field and C-Type field. The Class-Num field specifies the object class, while the C-
Type field further qualifies the type within the object class. The Length field specifies the
length of the object. Fig. 15 shows SESSION object as an example. As a mandatory object
required in every RSVP message, the Class-Num of SESSION object is 1. In [16], SES-
SION object is used to identify an RSVP session, the corresponding C-Type is 1. For this
purpose, the object contents include the destination address, protocol ID, and destination
port number (i.e., application running on the destination for which this connection is being
established), as shown in Fig. 15a. With the extension of RSVP to RSVP-TE, a new C-
Type was defined (C-Type of 7). The new SESSION object, including destination address,
Length (bytes) Class-Num C-Type
(Object contents)...
Figure 14. RSVP object.
30
tunnel ID, and extended tunnel ID, is used to identify an LSP tunnel.
With RSVP-TE signaling protocols evolve, some fields, parameters are obsolete while
more new fields, parameters are introduced. TLV-structured object is flexible, extensible,
and suitable for these evolving protocols. In this section, we discuss all mandatory objects
in Path, Resv, PathTear, and ResvTear messages.
3.2.2.1 FILTER_SPEC/SENDER_TEMPLATE object
The FILTER_SPEC/SENDER_TEMPLATE object specifies the source address of an
LSP. The former is carried in Resv message with a Class-Num of 10, while the latter car-
ried in Path message with a Class-Num of 11, Fig. 16. C-Type 7 is defined for IPv4 tunnel.
FILTER_SPEC/SENDER_TEMPLATE object, together with SESSION object, forms a five
tuple <Src_IP_Addr, LSP_ID, Dest_IP_Addr, Tunnel_ID, Ext_Tunnel_ID>, which
uniquely identifies an LSP.
The 32-bit IPv4 tunnel sender address is the IP address of the sender, the 16-bit LSP
ID is a locally (at the sender) allocated ID for the LSP. The FILTER_SPEC/
SENDER_TEMPLATE object keeps unchanged across the whole LSP.
3.2.2.2 FLOWSPEC/SENDER_TSPEC object
The FLOWSPEC/SENDER_TSPEC object specifies the required traffic parameters of
Figure 15. TLV structure of the SESSION object.
Length (16) Class-Num (1) C-Type (7)IPv4 tunnel end point address
Must be zero Tunnel IDExtended Tunnel ID
Length TypeValue
Length (12) Class-Num (1) C-Type (1)IPv4 Dest Address
Protocol ID Flags Dst Port
(a). SESSION object defined in RFC2205 (RSVP). (b). SESSION object defined in RFC3209 (RSVP-TE).
Length Class-Num (10/11) C-Type (7)
IPv4 tunnel sender address
MUST be zero LSP ID
Figure 16. FILTER_SPEC/SENDER_TEMPLATE object.
31
an LSP. The former is carried in Resv message (Class-Num 9), the latter carried in Path
message (Class-Num 12), Fig. 17. The FLOWSPEC and SENDER_TEMPLATE objects
were first introduced in [16] where the explanations of these two objects are subject to Int-
Serv [37]. RSVP-TE (GMPLS) with extensions supporting SONET/SDH introduced a
new C-Type (TBA).
SONET/SDH signal has a hierarchical structure. Each signal has an elementary signal,
which can be further concatenated (contiguously or virtually) to form higher rate signal.
The Signal_Type field specifies the type of the elementary signal. The RCC field is used to
request for contiguous concatenation. If the RCC field is ‘1’, the contiguous concatenation
is used, the NCC field indicates the number of identical elementary signals that are
requested to be concatenated. The NVC field indicates the number of signals that are
requested to be virtually concatenated. Based on all previous fields, the MT field indicates
the number of identical signals that are requested for the LSP, i.e., that form the final sig-
nal. These signals can be either identical elementary signals, or identical contiguously
concatenated signals, or identical virtually concatenated signals.
3.2.2.3 Generalized LABEL object
The LABEL object was initially defined in [16] to support the setup of an LSP tunnel,
Class-Num 16, C-Type 1. In [13]-[15], the LABEL object is generalized to represent time-
slot, wavelength, switch port, etc., as shown in Fig. 18. The generalized LABEL object,
carried in Resv message, extends the traditional label by allowing the representation of not
Length Class-Num (9/12) C-Type (TBA)
Signal Type RCC NCC
NVC Multiplier (MT)
Transparency (T)
Figure 17. FLOWSPEC/SENDER_TSPEC object.
32
only labels which travel in-band with associated data packets, but also labels which iden-
tify time-slots, wavelengths, or space division multiplexed position. In our application,
SONET, a label represents a set of time-slots.
Each generalized LABEL object carries a variable length label. The format of the label
depends on the application. Fig. 19 shows the format of a SONET/SDH label. The expla-
nations of these fields are given in Fig. 20.
3.2.2.4 Generalized LABEL_REQUEST object
The LABEL_REQUEST object defines the characteristics of the requested LSP. It was
initially defined in [11] to support the setup of an LSP tunnel. In [13]-[15], a generalized
LABEL_REQUEST object was introduced, as shown in Fig. 21. The suggested C-Type is
Length Class-Num (16) C-Type (2)
Label...
Figure 18. LABEL object.
S U K L M
Figure 19. Format of a SONET/SDH label.
STS-3N
STS-3 SPE 3c
STS-1 SPE
VT Group VT-6
VT-3
VT-2
VT-1.5
xN
x3
x7
L
x2
x3
x4
M
44.736MB
44.736MBDS3
6.312MBDS2
2.048MBE1
3.152MBDS1C
1.544MBDS1
For example, S>0, U=1, K=0, L=0, M=0denotes the unstructured SPE of the s-th STS-3and the 1st STS-1.
US
Figure 20. SONET multiplexing hierarchy and generalized label.
33
4.
The 8-bit LSP Encoding Type indicates the encoding of the LSP being requested. The
value for SONET/SDH is 5. The 8-bit Switching Type specifies the type of switching per-
formed on a link. The value for “Time-Division-Multiplex Capable” is 100 (our implemen-
tation is for transit SONET switches, which use TDM technology). G-PID identifies the
payload carried by the LSP.
3.2.2.5 RSVP_HOP object
The RSVP_HOP object was initially defined in [16], Class-Num 3, C-Type 1-2. In
[15], in order to support out-of-band signaling (separation of control channel and data
channel), a new C-Type, with support for interface ID, was added. The suggested C-Type
value is 3.
In RSVP, RSVP_HOP object carries the IP address of the RSVP-capable node that sent
this message, it is called PHOP (“previous hop”) in downstream messages (Path mes-
sages) or NHOP (“next hop”) in upstream messages (Resv messages). When RSVP-TE is
extended for GMPLS, in order to support the separation of control channel and data chan-
nel, a IF_INDEX TLV is added, Fig. 22.
The 32-bit IPv4 Next/Previous Hop Address specifies the IP address of the previous
Length Class-Num (19) C-Type (4)
LSP Enc. Type Switching Type G-PID
Figure 21. LABEL_REQUEST object.
Length (24) Class-Num (3) C-Type (3)
IPv4 Next/Previous Hop Address
Logic Interface Handle (LIH)
Type (3) Length (12)IP AddressInterface ID
IF_INDEX TLV
Figure 22. RSVP_HOP object.
34
LSR (in a Path message) or next LSR (in a Resv message) on the control plane. The 32-bit
LIH specifies the logical outgoing interface on which the reservation is required.
IF_INDEX itself is a TLV. Inside it, the 32-bit IP Address specifies the IP address of the
previous LSR or next LSR on the user plane. The 32-bit Interface ID can be used to iden-
tify a physical interface.
3.2.2.6 SESSION object
The SESSION object was initially defined in [16], Class-Num 1, C-Type 1/2. Refer-
ence [11] defined C-Types 7/8 for LSP tunnel. In our application, C-Type is 7, meaning
IPv4 tunnel, Fig. 23. SESSION object is carried in all four RSVP-TE messages. As men-
tioned before, SESSION object, together with SENDER_TEMPLATE/FILTER_SPEC
object, uniquely identifies an LSP.
The 32-bit IPv4 tunnel end point address specifies the destination address of an LSP.
The 16-bit tunnel ID and 32-bit extended tunnel ID are used to identify an LSP tunnel.
The STYLE and TIME_VALUES objects are also mandatory in [16]. However, these
two objects become obsolete in [11][15] when RSVP was extended for MPLS and
GMPLS. We include these two objects in the subset because they are mandatory. We
expect that the hardware will accept but not process these two objects. Therefore the
details of these two objects are omitted.
3.2.3 Optional objects support
Besides the mandatory objects described above, we also support several optional
Length Class-Num (1) C-Type (7)
IPv4 tunnel end point address
Must be zero Tunnel ID
Extended Tunnel ID
Figure 23. SESSION object.
35
objects in the subset, i.e., SUGGESTED_LABEL, MESSAGE_ID and MESSAGE_ID_
ACK. Although these objects are currently optional, we expect them to be widely adopted
in implementations of RSVP-TE for GMPLS networks.
3.2.3.1 SUGGESTED_LABEL object
The SUGGESTED_LABEL object was first introduced in [15] for the purpose of allo-
cating labels in the forwarding direction, Class-Num 129, C-Type 2. The
SUGGESTED_LABEL object has the same structure of generalized LABEL object.
There is potential conflict in SONET networks if the SUGGESTED_LABEL object in
Path message is left as optional. Consider the following scenario. An outgoing interface
(STS-12) has four available time slots, “000 011 111 111.” A first call requests an STS-3c;
the upstream switch tentatively reserves the first three time slots. A second call requesting
an STS-1 arrives next, and needs to be routed on this same outgoing interface. The switch
will then make a tentative reservation of the remaining time slot. If the Resv message for
the second call returns first, and the downstream switch assigns a time slot different from
the one tentatively reserved in forward direction, the first call for which a tentative reser-
vation was made can no longer be accommodated because it requires a concatenated
assignment. Hence, we recommend that the SUGGESTED_LABEL object be mandatory in
Path messages to force the downstream switch to use the label selected by the upstream
switch. This is the result of RSVP-TE growing out of RSVP, which was developed as a
protocol for receiver-initiated additions to a multicast tree. In GMPLS networks, where
hard resource reservations of time slots and wavelengths are necessary, a reservation and
corresponding time-slot/wavelength selection need to be made in the forward direction of
call setup.
36
3.2.3.2 MESSAGE_ID/MESSAGE_ID_ACK object
The MESSAGE_ID and MESSAGE_ID_ACK objects were defined in [40] to support
reliable transmission of RSVP messages, the former with Class-Num 23, C-Type 1, the lat-
ter with Class-Num 24, C-Type 1, as shown in Fig. 24.
RSVP-TE runs on top of connectionless, unreliable IP protocol. In [40], a TCP-like
mechanism is introduced to provide reliable message transmission. This mechanism
includes MESSAGE_ID, MESSAGE_ID_ACK, and exponential back-off retransmission
algorithm. Unlike TCP, there is no window-based flow control.
This mechanism is defined on a per hop basis. Each RSVP message (Path, Resv, Path-
Tear/ResvTear, etc.) includes a MESSAGE_ID object. This MESSAGE_ID object, together
with the message generator’s IP address, uniquely identifies a message. When a message
is received, the contents of the MESSAGE_ID object are copied to a MESSAGE_ID_ACK
object, the MESSAGE_ID_ACK object is then sent back to the message generator as an
acknowledgement. The MESSAGE_ID_ACK object can be piggy-backed in another RSVP
message, or carried in a separate Ack message. Multiple MESSAGE_ID_ACKs can be car-
ried in one RSVP message. If the message generator fails to get back the
MESSAGE_ID_ACK before a re-transmission timer expires, the message generator will
re-transmit the message. The time-out value of the re-transmission timer doubles per re-
transmission (exponential back-off).
Length Class-Num (23/24) C-Type (1)
Flags Epoch
Message_Identifier
Figure 24. MESSAGE_ID/MESSAGE_ID_ACK object.
37
3.2.4 Connection setup and teardown with RSVP-TE signaling messages
Similarly to Fig. 1, Fig. 25 shows the procedure to set up an end-to-end unidirectional
connection with RSVP-TE signaling messages. Path messages, carrying the destination
address, the required traffic parameters, and other message fields, progress from the
source to destination hop-by-hop, and Resv messages travel in the reverse direction, back
to the source. The same five steps as mentioned in Chapter 1 are carried out at each
switch. Since we support SUGGESTED_LABEL object in our subset, the first three steps
are performed in the forward direction, i.e., when Path messages are processed. Upon
completion of data exchange, either source or destination can release the connection. The
source node can initiate the release procedure with PathTear messages and the destination
node can do the same with ResvTear messages. In RSVP-TE, the release procedure is not
confirmed.
SwitchSW4
SwitchSW5
SwitchSW1 Switch
SW2
SwitchSW3
Source Destination
1. Path2. Path 3. Path
4. Path
5. Tear6. Tear7. Tear8. Tear
9. Connection (circuit orvirtual circuit) established
Figure 25. Illustration of connection setup with RSVP-TE messages.
38
Chapter 4 RSVP-TE Procedures for Hardware-
Acceleration
In Chapter 3, we extracted a subset of RSVP-TE for hardware acceleration. This sub-
set is described in a language of communication protocols. In this chapter, we describe the
procedures needed to implement this subset of RSVP-TE. Registers and data tables used
in the hardware implementation are defined.
Four types of messages are handled in hardware, Path, Resv, PathTear, and ResvTear.
As each message is received, it is parsed out and different fields of the objects are dis-
patched to corresponding registers. We list these registers in Section 4.1. Processing of
signaling messages involves reading and writing data tables. We describe these data tables
in Section 4.2. In Section 4.3, we use Path message as an example, illustrate the detailed
procedures to process a typical RSVP-TE signaling message.
4.1 Registers
Registers are commonly used in generic micro-processors, DSPs, ASICs, and other
digital VLSIs, to store data temporarily. We define registers to buffer message fields,
and intermediate data. We also define configuration registers.
Table 2. Data registers.
Register Width(bit) Description
Msg_Type 8 Message type, Msg Type in Common HeaderMsg_Len 16 Message length, RSVP length in Common HeaderLocal_Chksum 16 Locally calculated checksumSend_TTL 8 Send TTL, Send_TTL in Common HeaderSrc_IP_Addr 32 Source IP address, IPv4 tunnel sender address in
SENDER_TEMPLATE object within Path message or FILTER_SPEC object within Resv message.
36
8
39
LSP_ID 16 LSP ID in SENDER_TEMPLATE object within Path message or FILTER_SPEC object within Resv message.
Dest_IP_Addr 32 Destination IP address, IPv4 tunnel end point address in SES-SION object within Path/Resv message
Tunnel_ID 16 Tunnel ID in SESSION object within Path/Resv messageExt_Tunnel_ID 32 Extended tunnel ID in SESSION object within Path/Resv messagePre_IP_Addr_Ctrl 32 Previous IP address on control plane, IPv4 Previous Hop Address
in RSVP_HOP object within Path messageNext_IP_Addr_Ctrl 32 Next IP address on control plane, the result of User/Control Map-
ping table look-upLIH 32 Logical Interface Handle in RSVP_HOP object within Path mes-
sagePre_IP_Addr_User 32 Previous IP address on user plane, IP Address in IF-INDEX TLV
within Path messagePre_IF_ID_User 32 Upstream node output interface ID, Interface ID in IF-INDEX
TLV within Path messageNext_IP_Addr_User 32 Next hop IP address, the result of Routing table look-upSeq_Num 4 Sequence number of the links between two neighboring LSRs.Incoming_IF_ID_User 32 Incoming interface ID, the result of Incoming Connectivity table
look-upOutgoing_IF_ID_User 32 Outgoing interface ID, the result of Outgoing Connectivity/CAC
table look-upIncoming_PHY_ID 8 Incoming physical interface IDOutgoing_PHY_ID 8 Outgoing physical interface IDIncoming_Assigned_Timeslots
12 time-slots assigned to a certain incoming logical link
Outgoing_Assigned_Timeslots
12 time-slots assigned to a certain outgoing logical link
Incoming_Label(s) 32 Generalized label(s) for the incoming interface, locally allocatedOutgoing_Label(s) 32 Generalized label(s) for the outgoing interface, received from
downstream LSRSignal_Type 8 Signal Type in SENDER_TSPEC object within Path message or
FLOWSPEC object within Resv messageRCC 8 Requested Contiguous Concatenation in SENDER_TSPEC object
within Path message or FLOWSPEC object within Resv messageNCC 16 Number of Contiguous Concatenation in SENDER_TSPEC
object within Path message or FLOWSPEC object within Resv message
NVC 16 Number of Virtual Concatenation in SENDER_TSPEC object within Path message or FLOWSPEC object within Resv message
MT 16 Multiplier in SENDER_TSPEC object within Path message
Table 2. Data registers.
Register Width(bit) Description
40
4.2 Data tables
Fig. 26 is an illustrative network that shows: (i) switches can be connected via many
State 4 Current state of an LSPAvail_BW 12 Available bandwidth, the result of Outgoing CAC table look-upLSP_Enc_Type 8 LSP Encoding Type in LABEL_REQUEST object within Path
messageSwitching_Type 8 Switching Type in LABEL_REQUEST object within Path mes-
sageG-PID 16 Generalized PID in LABEL_REQUEST object within Path mes-
sageRefresh_Period 32 Refresh Period in TIME_VALUES object within Path messageStyle 5 Option vector in STYLE object within Resv message
Table 3. Configuration registers.
Registers Width (bits)
Init value Meaning
Init_LSP_Enc_Type 8 5 SDH ITU-T G.707/SONET ANSI T1.105Init_Switching_Type 8 100 Time-Division-Multiplex Capable (TDM)Local_IP_Addr_Ctrl 32 - Local IP address on the control planeLocal_IP_Addr_User 32 - Local IP address on the user planeLocal_MAC_Addr_ctrl 48 - Local Ethernet MAC address on the control planeSupported_ST 8 1 The cross-connect rate of the switch - 1 means OC1TO 32 - The initial time-out value of the re-transmission timerPTO 32 - The time-out value of the piggy-back timer
Table 2. Data registers.
Register Width(bit) Description
Figure 26. Network used for illustrative examples.
Network of IP routers (not necessarily the Internet;could be a specially designed IP network just forsignaling traffic)
Switch
Signalingengine
Switch II
Signalingengine
Client
Switch I
Signalingengine
Switch III
Signalingengine
Logical link
Server
Server-to-clientSONETunidirectionalcircuit
1 234
6 12
34 5
7
Physical interface: 5 Physical interface: 5
6
Physicalinterface: 7
Physicalinterface: 2
41
user-plane interfaces (ii) logical links can be provisioned between non-adjacent switches
(iii) a signaling engine may have many signaling links leading to the connectionless sig-
naling network (IP network). We will extensively refer to this figure when we discuss the
data tables.
4.2.1 Routing table
The Routing table, as shown in Table 4, is initialized and maintained by a software
routing module (for example, an implementation of OSPF-TE for GMPLS). It is read-only
to the hardware signaling accelerator. The Next_IP_Addr_User is the user-plane IP
address for the next hop switch through which the destination IP address can be reached.
Look-up of routing table is a longest prefix match. In general, a routing table can have
alternative next hop switches. However, we do not implement multiple routing table look-
ups in the hardware signaling accelerator for simplicity reasons. It attempts only one
route; if resources are not available, it will pass off the message to the software signaling
process. For a subsequent release, we will re-consider this design decision.
4.2.2 Incoming Connectivity table
The Incoming Connectivity table, as shown in Table 5, is initialized and updated by a
software link management module. Similar to the Routing table, it is read-only to the hard-
ware signaling accelerator. It shows how the interface number used by a neighbor maps to
Table 4. Routing table.
Index Return value
Network_Prefix Subnet_Mask Next_IP_Addr_User
Table 5. Incoming Connectivity table.
Index Return value
Pre_IP_Addr_User Pre_IF_ID_User Incoming_IF_ID_User
Incoming_PHY_ID Incoming_Assigned_Timeslots
42
an incoming interface number and the corresponding physical interface. The incoming
assigned time-slots column is a bit-vector with ‘1’s for the time-slots assigned to this logi-
cal interface and ‘0’s for other time-slots.
Using the example shown in Fig. 26, the incoming connectivity table at switch II will
have entries as shown in Table 6. Here we assume that time-slots 1-3 of the physical inter-
face 5 are terminated at switch II while the remaining time-slots 4-12 (we assume all inter-
faces are STS-12s) are used for the logical link passing through switch II to terminate at
switch III. On the other hand, we assume that all 12 time-slots of physical interface 2 ter-
minate at switch II. In cases where all time-slots of a physical interface terminate at a
switch, we use the same number for physical interface as for the interface ID used in sig-
naling messages. Otherwise, a logical interface number that maps to a physical interface
and time-slots is used in signaling messages to identify logical links.
4.2.3 Outgoing Connectivity table
The Outgoing Connectivity table, Table 7, shows the outgoing interfaces leading to
neighboring switches. It also maintains mapping of logical interface IDs to physical inter-
face IDs and corresponding time-slots. Since a switch may have multiple links to a neigh-
bor, there may be many rows in this data table corresponding to a single
Table 6. Incoming Connectivity table at switch II in Fig. 26.
Index Return value
Pre_IP_Addr_User Pre_IF_ID_User Incoming_IF_ID_User
Incoming_PHY_ID
Incoming_Assigned_Timeslots
Switch I IP address 4 2 2 1111 1111 1111Switch I IP address 6 1 5 1110 0000 0000
Table 7. Outgoing Connectivity table.
Index Return value
Next_IP_Addr_User Seq_Num Outgoing_IF_ID_User
Outgoing_PHY_ID
Outgoing_Assigned_Timeslots
43
Next_IP_Addr_User. Hence we have the Seq_Num column allowing the hardware signal-
ing accelerator to search this table multiple times until it finds an interface to the neighbor-
ing switch with sufficient available resources or exhausts all interfaces to the neighboring
switches.
An example Outgoing Connectivity table for switch I in Fig. 26 is shown in Table 8.
Time-slots 1-3 on physical interface 5 at switch I terminate at switch II while the remain-
ing are routed through as a logical link to switch III.
4.2.4 Outgoing CAC table
The Outgoing CAC table, Table 9, maintains the available time-slots for each physical
outgoing interface. Table 10 shows an example Outgoing CAC table.
Avail_BW is a bit-map of the available time-slots on a certain outgoing interface. ‘1’
means the corresponding time-slot is available while ‘0’ means it is not available. Fig. 27
shows an example of Avail_BW. A request for STS-1 can be satisfied but a request for
Table 8. Outgoing Connectivity table for switch I in Fig. 26.
Index Return valueNext_IP_Addr_User Seq_Num Outgoing_IF
_ID_UserOutgoing_PHY_ID
Outgoing_Assigned_Timeslots
Switch II IP address 0 6 5 1110 0000 0000Switch II IP address 1 4 4 1111 1111 1111Switch III IP address 0 7 5 0001 1111 1111
Table 9. Outgoing CAC table.
Index Return value
Outgoing_PHY_ID Avail_BW
Table 10. Outgoing CAC table for switch I in Fig. 26.
Index Return value
Outgoing_PHY_ID Avail_BW5 1111 1111 10004 0110 0011 11115 1111 1111 1111
44
STS-3c cannot, because even though there are 8 time-slots available, the contiguous con-
catenation requirement cannot be satisfied. “Reserve time-slots” means setting the corre-
sponding bits to “0.”
4.2.5 User/Control Mapping table
Unlike packet-switched MPLS networks, where implicit in-band signaling is used and
there is one-to-one association between control channels and data links, circuit-switched
networks may require out-of-band signaling and hence a separation of control channels
from corresponding data links. The User/Control Mapping table, Table 11, is used to map
user plane IP addresses of neighboring switches to the corresponding control plane IP
addresses.
Our implementation supports multiple data links between two neighboring LSRs, as
shown in Fig. 26. A switch may also have multiple control plane interfaces as shown in
Fig. 26 for switch II. We assume that load balancing of signaling load across multiple con-
trol plane interfaces will be done outside the signaling module and appropriate data down-
loaded to this User/Control Mapping table. We assume that there is only one
Next_IP_Addr_Ctrl associated with each next-hop switch.
Table 11. User/Control Mapping table.
Index Return value
Next_IP_Addr_User Next_IP_Addr_Ctrl
Figure 27. Example of Avail_BW.
1 1 0 1 1 0 1 1 0 1 1 0
1: Available0: Not available (reserved)
45
4.2.6 State table
The State table, as shown in Table 12, is the most complex of all the data tables. The
State table consists of two parts: the index part, which includes the Src_IP_Addr, LSP_ID,
Dest_IP_Addr, Tunnel_ID, Ext_Tunnel_ID, and uniquely identifies an LSP, and a return
value part, which consists of many parameters about the LSP.
4.3 Procedures
After an RSVP-TE signaling message is parsed, fields from within objects are sent to
the registers defined in Section 4.1. The parsed message is then processed according to its
message type. In this process, the data tables defined in Section 4.2 are referred and
updated. In this section, we use Path message as an example to detail the processing of an
RSVP-TE signaling message. Inside the Path message, we focus on the processing of
SESSION object.
Fig. 28 shows the processing of common header. After a message is completely
received, the checksum and the protocol version are verified. If no error detected, the
fields in the common header are saved in registers. Then the message is further processed
according to the decoded message type. In this example, a Path message is detected
Table 12. State table.
Index (Global connection reference)Return value
Control plane Data plane
Src_IP_Addr
LSP_ID
Dest_IP_Addr
Tunnel_ID
Ext_Tunnel_ID
Pre_IP_Addr_Ctrl
LIH Next_IP_Addr_Ctrl
Pre_IP_Addr_User
Pre_IF_ID_User
Incoming_Assigned_Timeslots
Outgoing_Assigned_Timeslots
Return value (con’t)
Data plane (con’t)
Incoming_IF_ID_User
Incoming_PHY_ID
Incoming_Label(s)
Next_IP_Addr_User
Outgoing_IF_ID_User
Outgoing_PHY_ID
Outgoing_Label(s)
Traffic_Spec
State
46
(Msg_Type=1).
The body of a Path message consists of several self-contained objects. Fig. 29 shows
the processing of object header. The Class-Num, C-Type, and Length of the object are
extracted. The object is further processed based on its type. In this example, it is a SES-
SION object (Class-Num=1).
Figure 28. Processing of Common header.
Message arrives
Local_Chksum=RSVP Checksum?
No
Yes
Msg_Type<- Msg TypeMsg_Len<- RSVP LengthSend_TTL<- (Send_TTL-1)
Msg_Type=? Other values
Path message Resv message PathTear message ResvTear message1 2 5 6
Calculate Local_Chksum
Vers=1? No
Yes
Figure 29. Processing of Path message.
Path message
SESSION RSVP_HOP TIME_VALUES
1 3 5 1911 12
LABEL_REQUESTSENDER_TEMPLATE SENDER_TSPEC
Obj_Class<- Class-NumObj_Type<- C-TypeObj_Len<- Length
Obj_Class=?Other values
47
Fig. 30 shows the processing of SESSION object. First the C-Type is verified, only C-
Type 7 (IPv4 tunnel) is supported in the subset we defined. Then the fields inside the
object are extracted and sent to corresponding registers. The next step is to look-up the
Routing table with the given destination IP address. If a match is found, the returned value
(the next-hop IP address) is used to search the Outgoing Connectivity table to find the cor-
responding outgoing interface. Then the Outgoing CAC table is referred to get the avail-
able bandwidth on that interface. With the bandwidth information fetched from the CAC
table and the required traffic parameters extracted from SENDER_TSPEC object, band-
width (time-slots) reservation is attempted. If it succeeds, the reserved time-slots are
marked, the Outgoing CAC table is updated, and new SUGGESTED_LABEL object is cre-
ated.
48
As mentioned in Chapter 1, setting up a connection involves five steps at each switch.
The first three steps, i.e., determining the next-hop switch, checking for the availability of
and reserving required resources, and assigning “labels” for the connection, are accom-
plished when a Path message is processed.
Figure 30. Processing of SESSION object.
SESSION
C-Type supported? No
Dest_IP_Addr<- IPv4 tunnel end point addressTunnel_ID<- Tunnel IDExt_Tunnel_ID<- Extended Tunnel ID
F(State table, <Dest_IP_Addr,Tunnel_ID, Ext_Tunnel_ID,Src_IP_Addr, LSP_ID>)?
yes
No
Next_IP_Addr_User<- F(Routing table, Dest_IP_Addr)
(Outgoing_IF_ID_User, Outgoing_PHY_ID, outgoing_Assigned_Timeslots)<- F(Outgoing Connectivity table,<Next_IP_Addr_User, Seq_Num>)
Find next hop? No
Avail_BW* satisfies<Signal_Type, MT, NCC>?
Yes
Update Avail_BW (Set corresponding bits to 0's)Update Outgoing CAC table (Write back Avail_BW)Assign Suggested_Label
Done
From SENDER_TSPEC
Yes
Find a match?
Yes
No
Avail_BW<- F(Outgoing CAC table, Outgoing_PHY_ID)
yes (7)
From SENDER_TEMPLATE
49
After all objects are successfully processed, the State table is updated, a new Path
message is assembled and transmitted to the next-hop address (control plane), as shown in
Fig. 31.
All objectsdone?
No
Update State table
Assemble new Pathmessage
Transmit to next-hop
Figure 31. At the end of Path message processing.
50
Chapter 5 An FPGA-Based Hardware Signaling
Accelerator
As mentioned before, complexity and the requirement for flexibility are two chal-
lenges for hardware implementations of signaling protocols. In Chapters 3 and 4, we
defined a subset of RSVP-TE for hardware acceleration, with the part beyond this subset
relegated to software. This approach can partly solve the challenge of complexity.
In order to address the challenge of flexibility, we propose to use re-configurable hard-
ware, i.e., FPGAs, as the hardware platform for our RSVP-TE implementation. These
devices are a compromise between general-purpose processors used in software imple-
mentations at one end of the flexibility-performance spectrum, and ASICs at the opposite
end of this spectrum. The latest FPGA devices, such as Xilinx Virtex-II [41][42], even
support runtime partial re-configuration [43]. We can re-configure FPGAs with updated
versions of implementations as signaling protocols evolve while significantly improving
the performance relative to software implementations on general-purpose processors.
Based on the subset defined in Chapter 3 and the detailed descriptions given in
Chapter 4, we discuss an FPGA based hardware signaling accelerator in this chapter. In
Section 5.1 we give an architecture-level view of the hardware signaling accelerator. We
then detail various functional modules Section 5.2. We present the implementation and
simulation results in Section 5.3. Finally, we discuss the issue of re-configurability in
Section 5.4.
51
5.1 The architecture of the hardware signaling accelerator
Fig. 32 illustrates the architecture of the hardware signaling accelerator. It consists of
three stages, message parsing, message processing, and message assembling. In the mes-
sage parsing stage, signaling messages are buffered in incoming message buffer for check-
sum verification. Meanwhile, message objects are checked and fields are dispatched to
different registers in the register bank. In the message processing stage, message objects
are processed in parallel. Finally, appropriate objects are re-assembled into a new message
in the message assembling stage. The new message is then sent to the next switch. These
three stages are fully pipelined to achieve high throughput.
5.2 Functional modules
In this section, we discuss the functional modules illustrated in Fig. 32.
5.2.1 Register bank
At the center of Fig. 32 we show a register bank. In Chapter 4, we described the regis-
ters that are necessary for our RSVP-TE implementation. Since message parsing, message
processing, and message assembling are fully pipelined, each stage must be equipped with
Figure 32. Architecture of the hardware signaling accelerator.
MessageAssembler
ObjectDispatcher
IncomingMsg Buf
PCI Local Bus Interface
ResourceManagement
Data TableManagement
RegisterBank
CAC Table
RetransmissionManagement
GbE
Inte
rfac
e
Message Parsing Message Processing
Message Assembling
FIFO Interface Cross-connect Interface
TCA
M &
SRAM
IntefacesSRA
M I/F
52
a set of registers. These three sets of registers form the register bank.
Unlike in a regular pipelining scheme where each stage is attached to a fixed set of
registers, we propose a dynamic round-robin style pipelining scheme, as shown in Fig. 33.
In this scheme, a message, instead of a pipelining stage, is associated with a set of regis-
ters. Each register set has 3 flag bits indicating the pipelining stage of the register set and
associated signaling message. Similarly, each pipelining stage has 3 flag bits indicating
the register set associated with the message it is currently handling. When a message
enters the next pipelining stage, the associated flag bits change in a round-robin style. The
flag bits for each pipelining stage change in a similar round-robin style, but in the reverse
direction. For example, in Fig. 33, register sets 1, 2, and 3 are associated with three signal-
ing messages, which are in message assembling, message parsing, and message process-
Figure 33. Dynamic round-robin style pipelining.
MessageDispatching
MessageProcessing
MessageAssembling
Data flow arbitrator
Reg.Set 1
Reg.Set 2
Reg.Set 3
Register Bank
(a) Regular pipelining.
(b) Round-robin style pipelining.
53
ing stages respectively. When all stages are done with current messages, register sets enter
next stage, all flag bits are then updated.
All flag bits are connected to a data flow arbitrator, which controls the bidirectional
data flows between the register bank and three message handling units. Inside the data
flow arbitrator, there are 3 flag bits indicating the status of the three pipelining stages.
Only when all three stages are ready, the round-robin style flag bits rotate.
This scheme avoids data transfers between stages. It thus helps to reduce processing
delay and to increase throughput. The cost of this scheme is the extra arbitration logic and
flag bits.
5.2.2 Input message buffering system
In Fig. 32, parallel to the object dispatcher is an incoming message buffer. When a
message is received and dispatched to registers, it is simultaneously sent to this buffer,
which is a dual-port RAM inside the FPGA. The message is held in the buffer until it is
successfully processed, at which point it is flushed out. If any error occurs in the message
parsing or processing stage, such as the presence of unknown message/object/field, rout-
ing table look-up failure, resource reservation failure, the message is moved to an external
FIFO. The internal dual-port RAM, the external FIFO, and the related message buffer
Figure 34. Two-level of input message buffering.
(Dual-Port RAM)
256 x 32 InternalMessage Buffer
ObjectDispatcher
Message arrives
Msg BufMgmt. Module
XC2V3000 FPGA
64K x 32 ExternalMessage Buffer
(FIFO)
54
management module (also inside the FPGA) constitute a two-level message buffering sys-
tem, as shown in Fig. 34. The internal dual-port RAM buffers the messages in dispatching
and processing stages, while the external FIFO stores the messages that cannot be handled
by hardware signaling accelerator and need software intervention. The external FIFO,
which we will discuss in Chapter 6, works as an interface between the hardware signaling
accelerator and the software signaling process.
The internal dual-port RAM makes use of the BlockRAM resource in the Virtex II
FPGA device. The memory space is equally partitioned into four segments. Each segment
is 256-byte ( ), large enough to contain a typical RSVP-TE message. Although
at a given instant, there are only two messages in the parsing and processing stages, there
could be at most two error messages waiting to be transferred to the external FIFO. There-
fore the internal message buffer has four segments. The messages in the buffer are aligned
to 256-byte segments instead of being stored continuously. This approach simplifies
buffer management by avoiding buffer fragmentation.
Each segment can be in one of four possible states, Free, in parsing stage (S1), in pro-
cessing stage (S2), or Error. Correspondingly, we assign four flag bits to each segment, as
Figure 35. Buffer management module.
ErrorS2S1Free
0 63 255
123 0
023
1
4-by
teSegment 0 Segment 1 Segment 2 Segment 3
64 32-bit×
55
shown in Fig. 35. Although the flag bits are physically associated with each buffer seg-
ment, the Free bits and the Error bits from all segments are logically organized into two
separate queues. The “Free” segment queue indicates which segment is available to
accommodate a new message. The “Error” segment queue indicates the segments waiting
to be moved to the external FIFO. The queue pointers advance in round-robin style. In
Fig. 35, The message buffer management module maintains the flag bits, queues, and con-
trol the write/read operation to/from the internal dual-port RAM, and the write operation
to the external FIFO.
5.2.3 Two-level object dispatching
The TLV structure offers the flexibility required to introduce new objects as needed,
which is essential to protocols that keep evolving, like RSVP-TE. However, it impacts
message parsing for parameter extraction adversely, making it difficult to implement in
hardware. For example, when processing an IP packet header, the hardware can always
extract the 5th word from the IP header to obtain the destination IP address. But in RSVP-
TE, since the SESSION object carrying the destination IP address can occur anywhere in a
Figure 36. TLV style object processing.
Object dispatcher
SESSIONfield
dispatcher
RSVP_HOPfield
dispatcher
Unknownobject
processor
Two levels of dispatchers
Distributeddecoding
56
message, the hardware cannot extract the destination IP address from a fixed location.
To address the challenge imposed by the flexible TLV structure, we propose a solution
with two-level dispatching and distributed decoding. Fig. 36 illustrates this solution. It is
the RTL circuit generated by the Synplify synthesis tool from a VHDL model of our solu-
tion. There are two levels of dispatchers, a object dispatcher, and nine field dispatchers for
each object type. The object dispatcher mainly consists of two registers and two counters.
The registers store the lengths of the message and the object being read. Two counters are
used to count the number of words received in an incoming message, one for the whole
message and the other for the current object. The message length counter and the message
length register are used to delimit a message. Similarly, objects are delimited according to
the object length register and the object length counter. The delimited object is sent to all
field dispatchers, while only the field dispatcher matching the object type is triggered (dis-
tributed decoding). The fields in the matched object are then dispatched into correspond-
ing registers. The unknown object processor captures all unsupported objects. If a
message contains such an object (i.e., one outside the set of objects defined in the RSVP-
TE subset), the message should be passed to the software signaling process.
5.2.4 Data table management
The processing of an RSVP-TE signaling message involves accessing multiple data
tables. Among the six data tables we defined in Chapter 4, the Outgoing CAC table, which
we will discuss in the next section, is located inside the FPGA completely. All other data
tables, including the Routing table, Incoming Connectivity table, Outgoing Connectivity
table, User/Control Mapping table, and State table, reside in an external Ternary Content
Addressable Memory (TCAM) device. Some data tables, such as the Routing table, User/
57
Control Mapping table, and State table, have extra data stored in an external Static RAM
(SRAM) device associated with the TCAM. The organization of these data tables inside
the TCAM and SRAM devices will be discussed in Chapter 6. Since multiple data tables
share the same TCAM and SRAM devices, and since in the message processing stage, dif-
ferent objects are processed concurrently possibly requesting simultaneous access to dif-
ferent data tables, a data table management module is designed to arbitrate and sequence
the requests for different data tables.
The data table management module in the FPGA (see Fig. 32) implements a priority
arbitrator. We have three guidelines to set the priority. First, if the look-up of a data table
results in data that is used for look-ups in other data table, the former has higher priority.
Fig. 37 shows the dependencies among the data tables. From Fig. 37 we can conclude that
Routing table has higher priority over User/Control Mapping table and Outgoing Connec-
tivity table. Second, if the look-up of a data table has follow-up look-ups while another
data table does not, the former has higher priority. For this reason, Routing table has
higher priority over Incoming Connectivity table, in which case no other table involved
before finally updating the State table. Third, if a data table has more follow-up opera-
tions, it has higher priority. For this reason, Outgoing Connectivity table has higher prior-
State table(Read)
Routing table IncomingConnectivity table
OutgoingConnectivity table
User/ControlMapping table
State table(Write)
Figure 37. Dependencies among the data tables.
58
ity over User/Control Mapping table. There is no further operation after the access of
User/Control Mapping table, while the time-consuming resource-reservation operation
follows the access of Outgoing Connectivity table.
Fig. 38 shows the priority arbitrator for the TCAM device. It has two interfaces. On
the left side, it interfaces to the other part of the hardware signaling accelerator, accepting
requests and sending out grant signals. Based on the priority, at any moment only one of
the requests can be satisfied. On the right side, it interfaces to the TCAM device. If there is
any request for a data table, the TCAM_Req is active. Since multiple data tables reside in
the same TCAM device, and typically a TCAM device is partitioned into segments, the
data tables can be naturally fitted into different segments. The table index indicates the
segment of the TCAM in which each data table resides; it does not necessarily reflect the
priority. The TCAM interface is very complex, and we only describe the priority arbitrator
related signals in Fig. 38.
5.2.5 Resource (bandwidth) management
Signaling protocols are used in connection-oriented networks to set up and tear down
connections. As mentioned in Chapter 1, five steps are involved at each switch to set up a
connection. After the first step, i.e., route look-up, a switch checks the availability of
resources (bandwidth and optionally buffer space) on the determined output interface. If
PriorityArbitrator
ReqGntState table
ReqRouting table Gnt
OutgoingConnectivity table Gnt
Req
IncomingConnectivity table Gnt
Req
User/ControlMapping table Gnt
Req
High
Low
Prio
rity
TCAM_Req
Table_Idx
Table Index State table 000 Routing table 010 Outgoing Conn table 011 Incoming Conn table 100 User/Ctrl Mapping table 101
Figure 38. Priority arbitrator for TCAM.
59
resources are available, the switch reserves the required resources. When tearing down a
connection, the switch releases the allocated resources.
The hardware signaling accelerator targets SONET switches, in which case
“resources” are the time-slots on the output interface. Fig. 39 shows the resource manage-
ment module implemented as part of the hardware signaling accelerator. It consists of a
CAC table and a bandwidth allocator. The CAC table maintains the available time-slots on
each output interface. Our target switch fabric has 64 output interfaces (see Chapter 6),
each with 12 time-slots (STS-12 interface). Therefore the CAC table has 64 entries, each
with a bit-vector of 12 bits. A bit value “1” indicates that the corresponding time-slot is
available or otherwise not available. The CAC table is small enough to fit into on-chip
memory in the FPGA. This is also desirable since the CAC table is tightly coupled with the
bandwidth allocator, in the sense that each time the CAC table needs to be accessed twice:
the bit-vector is first read out from the CAC table, after resource reservation or releasing,
the updated bit-vector is then written back to the CAC table.
A hierarchical bandwidth allocator, which consists of a STS-1 designator, a STS-3c
designator, and a STS-12c designator, is designed because of the hierarchy and concatena-
tion features of SONET signals. There are four 3-input AND gates between STS-1 desig-
nator and STS-3c designator and one 4-input AND gate between STS-3c designator and
STS-12c designator. A priority decoder (MSB has the highest priority) is used to select
Figure 39. Resource management module.
000000000001
111111 0 0 1 1 0 1 1 1 1 1 1 1 STS-1
0 0 1 1
0
STS-3c
STS-12c
CAC table Bandwidth Allocator
011001&
&
& & &
60
time-slots as per request. For example, if there is a request for 3 STS-1s on output inter-
face 25 (“011001”), we first get the entry corresponding to output interface 25 in the CAC
table. The entry is then copied to the bandwidth allocator. The designator corresponding to
STS-1 is checked and three time-slots are reserved, as shown in Fig. 40a. If the request is
for an STS-3c, the STS-3c designator is checked and time-slots reserved. If an STS-12c is
requested, the STS-12c designator determines that this request cannot be satisfied. After
resource reservation, the updated bit-vector is written back to the CAC table.
A priority decoder offers the best opportunity that the contiguous concatenation at the
higher level of hierarchy will not be broken by allocating signals at the lower level. For
example, in Fig. 40b, an allocation scheme other than priority decoding is used, resulting
in no availability at the STS-3c level. A new request for STS-3c cannot be fulfilled,
though there are enough time-slots available.
5.2.6 Re-transmission management
Timers are required to support the solution proposed in RFC 2961 [40] for reliable
message transmission. As discussed in Chapter 3, this solution requires every message to
carry a MESSAGE_ID object, which is acknowledged by MESSAGE_ID_ACK object car-
ried in an Ack message or piggy-backed in a message in the reverse direction. If the
MESSAGE_ID_ACK is not received before a retransmission timer times out at the sender,
the message is re-transmitted. A second timer, which we call piggyback timer, is used to
Figure 40. Comparison of resource allocation schemes.
0 0 1 1 0 1 1 1 1 1 1 1
0 0 1 1
0
3 STS-10 0 0 0 0 0 1 1 1 1 1 1
0 0 1 1
0
MSB LSB0 0 1 1 0 1 1 1 1 1 1 1
0 0 1 1
0
MSB LSBMSB LSB
3 STS-10 0 1 1 0 1 1 0 0 0 1 1
0 0 0 0
0
MSB LSB
(a) Priority-decoder based resource allocator. (b) Round-robin style resource allocator.
STS-1
STS-3c
STS-12c
Before reservation Before reservationAfter reservation After reservation
61
hold MESSAGE_ID_ACK objects awaiting a message to be sent in the reverse direction to
avoid unnecessary Ack messages. An Ack message is generated only if this timer expires.
Fig. 41 illustrates the proposed re-transmission management scheme. The hardware
signaling accelerator maintains a system timer, , which provides system timing for all
other timers. On the transmitting side, when a signaling message is sent out, the message,
together with a time tag marking the transmitting time (the value at the moment the
message is transmitted), is also copied to the unacknowledged message buffer. The buffer
is composed of equal sized blocks, one block for one message, and organized as queue
(first in first out). Therefore the head of the queue always contains the oldest message. The
time tag of the head message is copied to the retransmission timer, . Assuming the
initial time-out value is (this value is kept in a register and set at initialization time),
when , the re-transmission timer times out, the message is retransmit-
ted, and copied to the buffer for the first re-transmission. The first re-transmission buffer is
organized in a similar way, with a retransmission timer, , and a time-out value of
(exponential back-off). According to [40], we support at most three re-transmissions.
On the receiving side, each received message must be acknowledged with a
Time tag
Neighbor 0 Neighbor kHead
Tail
Receiving sideTransmitting side
MessageTime tag
Head
Tail
timer0
Unacknowledgedmessage buffer
Buffer for the2nd retransmission
Tail
Head
MSG_ID_ACK
timer2 p0 pk
Buffer forMSG_ID_ACKs
Figure 41. Re-transmission management (buffers and timers).
tSYS
tSYS
timer0
TO
tSYS timer0 TO+≥
timer1
2TO
62
MESSAGE_ID_ACK object. The acknowledgment can be delayed (bounded by the piggy-
back time-out value) so that a MESSAGE_ID_ACK object can be piggybacked in a mes-
sage in the reverse direction, or multiple MESSAGE_ID_ACKs can be packed into one Ack
message. For this purpose, one buffer is allocated per neighbor. Each entry in buffer is a
MESSAGE_ID_ACK object destined for the neighbor . Associated with each entry is a
time tag indicating the time at which a message carrying the corresponding MESSAGE_ID
is received. Entries in each buffer are time ordered given the FIFO nature of these buffers.
For each neighbor , we maintain a piggyback timer , which keeps the time tag value
of the head entry. Assuming the piggyback time-out value is (this value is kept in a
register and set at initialization time), when , the piggyback timer times
out. If this happens or the buffer is full, an Ack message is generated to carry all
MESSAGE_ID_ACKs in the buffer. After these MESSAGE_ID_ACKs are flushed from the
buffer, the piggyback timer will be reset to the time tag value of the next available head
entry. If there is a message destined to neighbor before times out, the outstanding
MESSAGE_ID_ACKs for this neighbor can be piggybacked and sent with the message
(within, of course, the maximum message length limit).
The re-transmission timers and piggyback timers work in a similar way. However,
important differences do exist. There are three re-transmission timers (and associated
queues) in the system, each corresponding to one re-transmission attempt. The time-out
value doubles each time. On the other hand, the number of piggyback timers depends on
the number of neighbors, and each neighbor has a separate piggyback timer and queue.
The time-out values of these piggyback timers are the same.
i
i
i pi
PTO
tSYS pi PTO+≥
pi
i pi
63
5.3 Implementation and simulation results
As a proof of concept, we developed a prototype VHDL model for the hardware sig-
naling accelerator, used Synplify for synthesis and Xilinx ISE for placement and routing.
The implementation uses of the FPGA resources (Xilinx XC2V3000) without the
PCI core, which we plan to include on the FPGA. Table 13 shows the detailed implemen-
tation results.
We performed timing simulations of the hardware signaling accelerator using Model-
Sim simulator. Fig. 42-Fig. 44 show the simulation results for processing of Path, Resv,
and PathTear messages, respectively. Processing the Path message, which involves the
access and updating of the data tables, takes clock cycles. The time to receive a Path
message (a message is parsed while it is being received) is clock cycles, as is the time
to transmit the outgoing Path message (a message is transmitted while it is being assem-
bled). Receiving, processing, and transmitting all other messages each takes no more than
clock cycles. A detailed breakdown of the typical processing time for each message is
shown in Table 14.
Because the parsing, processing, and assembling stages are fully pipelined, idle cycles
are inserted if the stage is not ready. As a worst-case estimate, the total time for parsing,
Table 13. Implementation results.
Device PCI core Resource Eq. Gates Max freq.XC2V3000 w/o PCI 12% 360,000 90MHz
w/ PCI 21% 630,000 50MHz
Table 14. Clock Cycles to receive an RSVP-TE message.
Path Resv PathTear/ResvTear
Clock cycles
40 32 19
12%
37
40
40
64
processing, and assembling a single message consumes clock cycles, which is
microseconds with a clock. Since connection setup/release requires the handling
of three signaling messages, we require a total of microseconds per call. The call han-
dling capacity is as high as calls/sec because of pipelining, which allows the sys-
tem to accept a new message every clock cycles.
120 2.4
50MHz
7.2
400 000,
40
Figure 42. Processing of Path message.
Pathmessage
TCAMinterface
SRAMinterface
CACtable
Previousstage ready
Routingtable
IncomingConn table
OutgoingConn table
U2C Mappingtable
Statetable
End ofprocessing
Figure 43. Processing of Resv message.
Previousstage ready
TCAMinterface
SRAMinterface
Switch fabricinterface
Resvmessage
Read Statetable
Programswitch fabric
End ofprocessing
65
5.4 Re-configurability
The reason we choose FPGA as the hardware platform for the signaling accelerator is
its re-configurability. The FPGA device we use, Xilinx Virtex-II, can support runtime par-
tial re-configuration, meaning we can re-configure it even when it is working in the field
and we can selectively re-configure part of the device and keep the other part unchanged.
For example, our hardware signaling accelerator targets SONET switches. Specifi-
cally, it is designed for Vitesse 64x64 switch fabric, with STS-12 interfaces and a STS-1
cross-connect rate. Accordingly, the Incoming/Outgoing Connectivity tables each have 64
entries, corresponding to 64 interfaces. The CAC table also has 64 entries for 64 outgoing
interfaces, each with 12 bits for 12 time-slots in an STS-12 signal. The bandwidth alloca-
tor consists of 3-level hierarchy, the lowest level corresponding to STS-1 (the cross-con-
nect rate), the highest level corresponding to STS-12c (the maximum interface rate). If we
are going to support a 512x512 switch fabric with OC-192 interfaces and OC-3c cross-
connect rate (such as Sycamore® SN16000 [44]), we need to expand the Incoming/Outgo-
ing Connectivity tables, and CAC table to 512 entries. In CAC table, each entry has 64 bits,
Figure 44. Processing of PathTear message.
TCAMinterface
SRAMinterface
CACtable
PathTearmessage
Read Statetable
Release allocatedtimeslot
Previousstage ready
End ofprocessing
66
each bit corresponding to an OC-3c signal. We also need to re-design the bandwidth allo-
cator. The new bandwidth allocator consists of 4 levels, the lowest level corresponding to
the OC-3c cross-connect rate, then OC-12c, OC-48c, and OC-192c the highest level. All
other parts of the hardware signaling accelerator can keep unchanged.
A Xilinx Virtex-II FPGA can be re-configured in several milliseconds [43], which
makes hardware upgrading as easy as software upgrading.
67
Chapter 6 Design of a Prototype Switching System
In Chapter 1, we showed the unfolded view of a typical switching system. A complete
switching system includes both control-plane and user-plane devices. The user plane, car-
rying user traffic, typically consists of line cards terminating the incoming and outgoing
user traffic, and a switch fabric switching user traffic from an incoming interface to an
outgoing interface. The control-plane, on the other hand, controls the operation of the
user-plane. The control plane typically implements functionalities such as routing, signal-
ing, and link management.
In order to demonstrate a complete switching system equipped with our FPGA imple-
mentation of hardware signaling accelerator, we design a prototype switching system. It is
a PCI expansion board including both control-plane and user-plane devices. The control-
plane module implements the subset of RSVP-TE we defined in Chapter 3. The user-plane
device, a Vitesse 64x64 switch fabric, provides 64x64 switching capability under the con-
trol of the hardware signaling accelerator. The prototype switching system is connected to
a host computer through the PCI local bus. The host computer runs the software signaling
process. We have not implemented routing and link management modules in this switch-
ing system, but assume that these modules can be added as software modules on the host
computer. These modules initialize and maintain the Routing table and Incoming/Outgo-
ing Connectivity tables on the PCI board.
In this chapter, we discuss the design of the prototype switching system. We first sum-
marize the architecture of the prototype board, and then give the details of the main func-
tional devices. We also present the on-board clock and power distribution schemes, both
are essential for high-speed PCB board design.
68
6.1 Architecture of the PCI-Based prototype switching system
Fig. 45 shows the architecture of the proposed prototype switching system. It is a PCI
expansion board. The core component on the board is the hardware signaling accelerator,
which is a Xilinx XC2V3000 FPGA [41]-[43]. It implements the RSVP-TE subset
described in Chapters 3 and 4 and interfaces to the other devices on this board. We only
implement a single signaling channel, on which messages are received and transmitted.
This is a Gigabit Ethernet (GbE) interface. RSVP-TE messages are carried within IP
packet and arrive on this interface. Thus messages sent to different neighboring switches
can all be carried on the same GbE interface and switched through an IP router or Ethernet
switch to their appropriate destinations. More details on the selection of signaling chan-
nel(s) and the analysis of signaling transport mechanisms are given in Chapter 7. The opti-
cal transceiver, SerDes (Serializer/Deserializer), and GbE Media Access Control (MAC)
chips shown in Fig. 45 form the signaling channel.
Processing of RSVP-TE messages involves many data tables. These data tables can be
fairly large. Therefore we include a TCAM and an SRAM (SRAM0) on our prototype
board to hold these data tables. The data tables in these memory chips are initialized from
FPGA(XC2V3000)
Switch Fabric(VSC9182)
SRAM1(IDT71V2556)
SRAM0(IDT71V2556)
TCAM(IDT75P52100)
FIFO(IDT72V36100)
GbE MAC(L8104)SerDes
(HDMP-1636)
Clock OSC(CO43S)
Duplex SCConnector
PCI Connector
OpticalTransceiver
(HFCT-53D5)
Figure 45. Architecture of the prototype board.
69
the host system connected to this board through the PCI bus. The SRAM1 is used to buffer
the transmitted but as-yet unacknowledged messages. Specifically, we choose IDT®
IDT75P52100, a 64K x 72 Network Search Engine (NSE) device, as our TCAM, and
IDT® IDT71V2556, a 128K x 36 SRAM device.
The FIFO is the main interface unit between the hardware signaling accelerator and
the software signaling process. It is used to buffer signaling messages that cannot be han-
dled by hardware. If any error occurs in the message parsing or processing stage, the mes-
sage is queued in this FIFO. The software signaling process will fetch the messages from
the FIFO and process them as needed.
Finally, we include a SONET switch fabric as the user-plane device (note that in
Chapter 3, we stated that this RSVP-TE subset is being designed for such a SONET
switch). Specifically, we selected the Vitesse VSC9182 device, which is a 64x64 switch
fabric with STS-12 interfaces and an STS-1 cross-connect rate. In a full implementation,
the switch fabric will require its input and output ports to be connected to O/E and E/O
devices for termination of optical fiber links. Since our prototype board is focused on the
control-plane aspects of the switch, we do not test actual user data flow through this
device. Instead we include the user-plane device mainly to test whether or not our RSVP-
TE hardware signaling accelerator can perform the functions of setting up and deleting
cross-connections through the user-plane switch fabric device.
70
6.2 Main on-board devices
6.2.1 1 Gbps signaling channel [45][46][47]
The signaling channel consists of GbE MAC, SerDes, and optical transceiver,
as shown in Fig. 46. The optical transceiver, Agilent® HFCT-53D5, provides conversion
between electronic and optical signals. The SerDes, Agilent HDMP-1636A, converts
between the 8B/10B encoded parallel signals ( ) and high-speed serial signals
( ). Since the signals are 8B/10B encoded, the coding efficiency is , the
serial data rate is or ( ).
The LSI Logic® L8104 device is one of the most widely used GbE MAC controllers.
Figure 46. 1Gbps signaling channel.
HDMP-1636A
Transmitter section
PLL
Receiver section
PLL
L8104
RBC0RBC1
BYTSYNC
REFCLK
ENBYTSYNC
HFCT-53D5
TD-
TD+
RD-
RD+
TX[9:0]
RX[9:0]
1Gbps
125MHz
1250MHz 80%
1250MBd/sec 1000Mbps 1Gbps
Figure 47. Block diagram of L8104.
10-bitPHYI/F
SystemI/F
RX[9:0]RBC[1:0]
TX[9:0]TBC
TCLK
TransmitFIFO(4K)
ReceiveFIFO(16K)
RXD[31:0]RXENn
RXBE[3:0]RXSOFRXEOFRXWM1
TXD[31:0]TXENn
TXBE[3:0]TXSOFTXEOFTXWM1
Register Interface
MAC8B/10B
PCS
REG
CLK
REG
CSn
REG
D[1
5:0]
REG
D[7
:0]
REG
RD
nR
EGW
Rn
REG
INT
71
Fig. 47 shows the block diagram of L8104. Internally it consists of GbE MAC, 8B/10B
Physical Coding Sublayer (PCS), 16K-byte receive FIFO, and 4K-byte transmit FIFO.
Externally it has three interfaces, a 32-bit system interface connecting to the hardware sig-
naling accelerator (interface signals are given in Table 15), a 10-bit PHY interface con-
necting to the SerDes, and a 16-bit register interface connecting to FPGA for
configuration purpose.
In Chapter 5, the simulation results showed that the call handling rate of the hardware
signaling accelerator can be as high as calls/sec. This conclusion is based on a
working clock of , which is the maximum clock frequency of the hardware signal-
ing accelerator. In the prototype system, limited by the single signaling channel,
the call handling rate is less. For the 32-bit GbE interface, the working clock is
( ). Consequently, the call handling rate is calls/sec.
6.2.2 Incoming message buffer - FIFO device [48]
In Chapter 5, we introduced a two-level incoming message buffering system. We also
discussed the dual-port RAM and the related buffer management module inside the
FPGA. In this section, we discuss the external message buffer, i.e. the FIFO device, to
complete the discussion of this two-level incoming message buffering system. Compared
Table 15. L8104 system interface.
Transmit Interface Receive Interface
Signal I/O Description Signal I/O DescriptionTXD[31:0] I Transmit data RXD[31:0] O Receive dataTXBE[3:0] I Transmit byte enable RXBE[3:0] O Receive byte enableTXENn I Transmit enable RXENn I Receive enableTXSOFn I Transmit start of frame RXSOFn O Receive start of frameTXEOFn I Transmit end of frame RXEOFn O Receive end of frameTXWM1 O Transmit FIFO watermark 1 RXWM1 O Receive FIFO watermark 1
400 000,
50MHz
1Gbps
33MHz
32-bit 33MHz× 1Gbps= 250 000,
72
to the internal dual-port RAM, the external FIFO is much larger (64K x 36 compared to
256 x 32). And unlike the internal dual-port RAM, the external FIFO is not segmented.
Instead, the messages are stored in the FIFO continuously.
Fig. 48 shows the block diagram of IDT72V36100. It consists of a 64 x 36 RAM array,
a write pointer, a read pointer, and the related control logic and flag logic. It has two inde-
pendent data ports, an input port and an output port. Table 16 lists the main interface sig-
nals on both ports.
The input port interfaces with the dual-port RAM and buffer management module,
which are the part of the two-level message buffering system inside the FPGA. The
D[31:0] connects to the output of the internal dual-port RAM. The WEN indicates that
there is data on the input port waiting to be written. If there is an error message in the
internal dual-port RAM (recall the Error flag bits we defined), the WEN is active. The FF
indicates that the FIFO is full and cannot accommodate more messages.
Table 16. Main interface signals of IDT72V36100.
Input Port Output Port
Signal I/O Description Signal I/O DescriptionD[31:0] I Data input Q[31:0] O Data outputWCLK I Write clock RCLK I Read clockWEN I Write enable REN I Read enableFF O Full flag EF O Empty flagPAF O Programmable almost-full flag PAE O Programmable almost-empty flag
Write CtrlLogic
Read CtrlLogic
RAM Array64K x 36
Write ptr.
Out
put R
egis
ter
Inpu
t Reg
iste
r
Read ptr.
Flag Logic
D[31:0]
WCLKWEN
FFPAF
Q[31:0]
RCLKREN
EFPAE
Figure 48. Block diagram of IDT72V36100.
73
The output port interfaces with the PCI module residing inside the FPGA, through
which the software signaling process can fetch the messages in the FIFO. The software
module has two options to fetch messages: through interrupts or via polling. With inter-
rupts, the hardware signals the software if there are messages waiting to be fetched, and
the software responds upon request. With polling, the software polls the prototype board
periodically. Since we do not expect many messages to require software intervention, We
choose interrupt scheme instead of polling.
The Q[31:0] connects to the PCI module inside the FPGA. A register Q[31:0] inside
the FPGA is mapped to an address in the PCI memory space ( in our
design, BASE is the PCI card base memory address assigned by the operating system).
The software signaling process (including a device driver) can read messages from this
address. When the software signaling process is ready to process the interrupt from the
FIFO, the PCI module activates REN. The design of the interrupt-handling mechanism
needs further work. If the occurrence of messages needing software intervention is rare,
the EF signal can be used to trigger an interrupt. Whenever there is a message waiting in
the FIFO (EF is not active), an interrupt is generated. On the other hand, if the occurrence
of such messages is frequent, the rate of interrupts could be too high. As an alternative, we
can use PAF to trigger an interrupt. In PCI local bus, there is a parameter called PCI
Latency Timer. It determines the maximum number of clock cycles a PCI device can hold
the PCI bus. A typical value of the PCI Latency Timer is 32, which is the default value for
most main boards. The number of PCI devices in a personal computer system is typically
fewer than six. The maximum response time to an interrupt is thus 192 clock cycles
(assuming that the prototype board is the last PCI device polled and all other PCI devices
BASE + 0x00000010
74
use their full 32-clock cycle allocation. Therefore, we set the PAF threshold to 256 (larger
than 192). When the FIFO is almost full, i.e., there are only 256 or fewer spaces available,
the PAF signal will be active and trigger an interrupt.
6.2.3 Data tables - the TCAM and SRAM devices [49][50]
In Chapter 5, we discussed the data table management module inside the FPGA that
manages the data tables residing in the external TCAM and SRAM devices. In this sec-
tion, we first introduce the TCAM, SRAM devices we use, and then detail the organiza-
tion of data tables in the TCAM and SRAM devices.
6.2.3.1 The TCAM and SRAM devices
TCAM stands for Ternary CAM, where CAM stands for Content Addressable Mem-
ory. A CAM, like a RAM, is a memory device used to store data, though it works in a dif-
ferent manner. In a RAM, the user supplies the address, and gets back the data. To the
contrary, with a CAM, the user supplies the data and gets back the address. Fig. 49 com-
pares a RAM and a CAM. Typically, a CAM works in conjunction with a RAM. The
matched address obtained from the CAM look-up is used to fetch the data stored in the
associated RAM, as shown in Fig. 49b. This feature makes CAMs suitable for table look-
ups. For example, in our hardware signaling accelerator, to find the next hop correspond-
Figure 49. RAM and CAM.
0xABCD5678
0x0000
0xFFFF
0x0001
0xFFFE
0x7FFE
0xABCD5678
RAM
Find a match for0xABCD5678
0xABCD5678
0x0000
0xFFFF
0x0001
0xFFFE
0x7FFE Asso. data
CAM RAM
(a) Indexing a RAM (b) Looking up a CAM
Obtain the data storedat location 0x7FFE
75
ing to a destination IP address, we use the destination IP address for a CAM look-up and
then obtain the next address stored in an associated RAM device.
Unlike in a CAM, where each binary digit (or bit, in short) can hold only one of two
values: ‘0’ or ‘1’, in a TCAM, each ternary digit can hold one of three values, ‘0’, ‘1’, or
‘X’ (don’t care). In practice, a ternary digit is often represented by two binary digits (2
bits). Therefore a TCAM array typically consists of two memory arrays, a data array and a
mask array. A bit value ‘0’ in mask array indicates that the corresponding ternary digit is
‘X’ no matter what the bit value is in data array. On the other hand, if a bit value in mask
array is ‘1’, the corresponding ternary digit value is determined by the bit value in data
array. A TCAM digit of ‘X’ will be ignored when doing a content match.
Fig. 50 illustrates how TCAM works. When a search word, e.g., , is
given, it is first ANDed with the masks in the mask array, then compared to the whole data
array to find a match. If multiple matches are found (in this example we have 3 matches),
a priority decoder then selects the one with the highest priority (hence 0x1234 is the
matched address). TCAMs are especially suitable for IP routing table look-up, where
Longest Prefix Match (LPM) is required. The routing table is sorted according to the
length of the network prefix, the longest prefix entries are located at the top (highest prior-
ity). The network prefix is stored in the data array, and the subnet mask is stored in the
128.238.34.219
76
mask array. In this way, if there is a match, it is always the longest one.
In our prototype system, we choose IDT75P52100 and IDT71V2556 as our TCAM
and SRAM devices. The hardware signaling accelerator, TCAM, and SRAM are con-
nected as shown in Fig. 51. When there is a request for a data table (recall the priority arbi-
trator in the data table management module), the hardware signaling accelerator sends out
a request, together with appropriate command and data to the TCAM. The TCAM device
will send back an acknowledgment signal if there is a match. The matched address can be
used to fetch the extra data stored in the SRAM device. Some data tables, i.e. the Incoming
and Outgoing Connectivity tables, have no associated data stored in the SRAM. In this
case, part of the matched address is returned to the hardware signaling accelerator.
The command bus consists of three parts, as shown in Fig. 52. INST[2:0] specifies the
instruction type (“010” for look-up), LTIN[1:0] specifies the width of the look-up (“00”
for 72-bit look-up), and SEGSEL[2:0] specifies which Segment Select Register will be
Figure 50. How does TCAM work.
Mask array
255.255.255.0
128.238.34.219Search word
255.255.255.240
255.255.0.0
Data array
128.238.34.0
128.238.34.208
128.238.0.0
AND0x1234
Any matchin data array?
PrefixLong
Short
PrefixHigh
Low
Figure 51. FPGA/TCAM/SRAM configuration.
Hardware SignalingAccelerator
XC2V3000
ZBT SRAM
IDT71V2556
REQSTB
CMD Bus
REQDATA Bus
CE
WE
SRAM Data
HITACK
Matched Addr
TCAM
IDT75P52100
77
used for look-up. We will discuss LTIN[1:0] and SEGSEL[2:0] in more detail in the next
subsection.
In Fig. 53, we use Routing table look-up as an example to show a typical look-up oper-
ation. At time , the hardware signaling accelerator activates the REQSTB signal, indi-
cating that there is a look-up request. At the same time, it drives the look-up command to
the CMD bus, and the destination IP address to the REQDATA bus. Five clock cycles later
(CCLK, we will explain later that the frequency of the internal working clock is only the
half of the I/O clock, CLK2X), at time , the matched address appears on the ADR bus
connecting to the SRAM device. The TCAM device also drives the CE and R/W signals to
enable the SRAM read operation. Two clock cycles later, at time , the returned next-
hop IP address is available on the SRAM data bus. The HITACK signal indicates that it is
a successful look-up. The signals used for table look-up operation are summarized in
Table 17.
Although it takes seven clock cycles to finish a typical look-up operation, it is not nec-
essary to wait until the end of the previous look-up to start a new one, as long as there is
no dependency between two consecutive look-up operations. For example, if the current
look-up operation is conducted on the Outgoing Connectivity table, a look-up on the
Command Bus
INST[2:0]
00 72-bit lookup01 144-bit lookup10 288-bit lookup11 576-bit lookup
LTIN[1:0]
000 SegSel0001010011100101110111
SEGSEL[2:0]
1 1 1 1 1 1 1 1SegSel0 Register
First 8 segmentsfor Routing table
000001010 Lookup011100101110111
Figure 52. TCAM command bus.
t0
t10
t14
78
Incoming Connectivity table can start immediately after, then followed by a look-up on the
User/Control Mapping table in the next clock cycle.
6.2.3.2 The organization of the data tables
The TCAM device is typically split into segments for two important reasons. First,
each segment can be selectively turned off to reduce power consumption. Second, each
segment can be flexibly configured, and multiple segments can be combined, to accom-
modate tables with different widths and depths. This feature is especially useful when
multiple data tables with different widths and depths reside in the same TCAM device.
Table 17. Typical TCAM and SRAM signals.
Signal I/O Description
REQSTB I Request strobeCommand
BusINST I InstructionLTIN I look-up typeSEGSEL I Segment select
REQDATA I Request data bus
To SRAMADR O AddressCE O Chip enableR/W O Read/Write enable
HITACK O Match acknowledgeFrom SRAM IO[31:0] I/O Data from SRAM
Figure 53. 72-Bit look-up timing diagram.
t0 t1 t2 t3 t4 t5 t6 t9t7 t8 t10 t11 t12 t13 t14 t15 t16CLK2X
CCLK
REQSTB
INST
LTIN
SEGSEL
ADR
CE
SRAM DATA
HITACK
REQDATA
Matched Addr
Next Hop Addr
t17
010
00
000
Dest .IP Addr
CMDBus
R/W
79
The TCAM device we use, IDT75P52100, has sixteen independent segments, and five
data tables share these segments.
IDT75P52100 has eight Segment Select Registers (SSRs), each has 16 bits representing
16 segments. A bit value of ‘1’ indicates the corresponding segment is selected, and vice
versa. In each look-up command, the SEGSEL[2:0] signal specifies which SSR is used for
current table look-up operation, the LTIN[1:0] signal specifies the width of the data table.
The Routing table has entries, each a 32-bit IP address. In TCAM device, it occupies
eight segments, each configured as . 72-bit is the minimum width of a possible
configuration. The Incoming Connectivity table has 64 entries, 38-bit each (32-bit neigh-
bor’s IP address plus 6-bit interface ID). In TCAM device, it occupies one segment (the
minimum unit for configuration), which is configured as . The Outgoing Con-
nectivity table ( ) and User/Control Mapping table ( ) are config-
ured in a similar way. The State table has entries, each entry has 128 bits (a five-tuple
as we explained before). In TCAM device, it occupies one segment, which is configured
as . Fig. 54 shows the configurations of the five data tables in the TCAM
device, and the corresponding look-up commands and SSR registers.
32K
4K 72-bit×
4K 72-bit×
64 38-bit× 64 32-bit×
1K
2K 144-bit×
SEGSEL[2:0]
000 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0
Segment Select Registers
SegSel0
001 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0SegSel1
010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0SegSel2
011 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0SegSel3
100 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0SegSel4
32K x 72-bit
4K x 72-bit
4K x 72-bit
4K x 72-bit
LTIN[1:0]
00 (72-bit)
00 (72-bit)
00 (72-bit)
00 (72-bit)
01 (144-bit)2K x 144-bit
Routingtable
IncomingConn tableOutgoing
Conn tableUser/Ctrl
Mapping table
State table
Figure 54. Configurations of the five data tables in the TCAM device.
80
Typically a TCAM device has an associated SRAM device to store extra data. The
matched address can be used to fetch the extra data from the SRAM device. The Routing
table, User/Control Mapping table, and State table follow this way. However, the Incom-
ing and Outgoing Connectivity tables work in a different way. As we explained in
Chapter 4, the return value of Incoming/Outgoing Connectivity table is the interface ID. In
our system, the switch fabric has 64 input/output interfaces, the interface ID ranges from 0
to 63, or from “000000” to “111111” in binary. We can arrange the locations of the Incom-
ing and Outgoing Connectivity tables in the TCAM device in such a way that the least 6
bits of the matched address represent the interface ID, as shown in Fig. 55. In this way we
can save the clock cycles to access the SRAM device.
The final mapping of the data tables into the TCAM and SRAM devices is shown in
Fig. 56.
Figure 55. Use matched address as interface ID.
0x0000
0xFFFF
0x7FFF0x8000
0x803F
0x8001
0x803E
IncomingConn table
0x9000
0x903F
OutgoingConn table
000000000001
111110111111
Last 6-bitInterface 0Interface 1
Interface 62Interface 63
Interface
TCAM Address
81
6.2.4 User-plane device - VSC9182 [51]
In order to demonstrate a complete signaling system, we included a “thin” user plane,
with the sole purpose of interfacing the hardware signaling accelerator to an actual switch-
ing device. The Vitesse 64x64 switch fabric, VSC9182, is the selected switching device.
Since it is impractical to design a complete 64x64 switch with the user-plane I/O inter-
faces on a PCI board and since our focus is on the control-plane not the user-plane, we
omitted these I/O interface devices. For testing purpose, we do however include SMB
connectors on 2 parts of the VSC9182 switch fabric to show that user data flows through
72-bit
64K
32-bitNetwork Prefix &
Subnet Mask
Incoming Connectivity table
Outgoing Connectivity table
32K
64
64
2K
Src_IP_Addr(32-bit)+LSP_ID(16-bit)+Dest_IP_Addr(32-bit)+Tunnel_ID(16-bit)+Ext_Tunnel_ID(32-bit)1K
128-bit
38-bit
38-bit
32-bitUser/Control Mapping table
64-bit
64
0x0000
0x0BCD128.238.34.117 128.143.74.193
32-bit
TCAM (Index) SRAM (Return value)
Next_IP_Addr_User
State information
0xFFFF
0x0000
0xFFFF
(228-bit)
Routing table
Unused
UnusedUnused
State table
Next_IP_Addr_User Next_IP_Addr_Ctrl
Pre_IP_Addr+Interface_ID
Next_IP_Addr+Interface_ID
Figure 56. Organization of the data tables.
82
after a circuit is established and stops when the circuit is released.
Fig. 57 shows the block diagram of the VSC9182. The core is an interconnection
matrix and program memory. It has three interfaces, the input backplane interface consist-
ing of 64 Low Voltage Differential Signaling (LVDS) input channels, the output backplane
interface consisting of 64 LVDS output channels, and the CPU interface used to configure
the registers and program the switch fabric. Programming the switch fabric (space switch-
ing and time-slot interchanging) is performed via the CPU interface by presenting the out-
put channel and time-slot address on the address bus A[10:0], and the input channel and
time-slot address on the data bus D[9:0], and filling the program memory. There are 64
input channels and 64 output channels, each with 12 time-slots. Fig. 58 shows how a value
on data bus or address bus represents a certain time-slot on a certain channel.
Fig. 59 shows the timing diagram to program the switch fabric. A rising edge on WRB
transfers the program information to the program memory. The programming does not
Figure 57. Block diagram of VSC9182.
InterconnectionMatrix
and Storage
ProgrammingInterface
InputBackplaneInterface
OutputBackplaneInterface
ClockSynthesis
PLL
RXD[63..0]+/- TXD[63..0]+/-
SYSCLKP/N
Clock
Data Data
Ctrl Ctrl
D[9
..0]
A[1
0..0
]C
SBW
RB
RD
BCO
NFI
G
Figure 58. Address bus and data bus.
A[10:0] D[9:0]1b 6b 4b 6b 4bRegister Group
(0=switch)
Destination Channel Number (0-63)(Channel number is output pin number)
Destination Timeslot Number (0-11)(First transmitted byte is timeslot 0)
Source Timeslot Number (0-11)(First received byte is timeslot 0)
Source Channel Number (0-63)(Channel number is input pin number)
83
take effect until the CONFIG pin is asserted. The switch fabric may be programmed con-
tinuously but the new map will not take effect until the user asserts the CONFIG signal.
The current address map may be read back by presenting the output channel and time-slot
address on A[10:0], releasing the D[9:0] signals and lowering RDB, causing the contents
of the register to appear at D[9:0].
The signals used to program the switch fabric are summarized in Table 18.
6.3 Clock distribution networks
Clock distribution is a critical issue in high-speed PCB board design. In our system,
there are eight high-speed devices requesting different clock frequencies, ranging from
33MHz to 125MHz. Some devices need highly synchronous timing. Besides, the PCI bus
has a 33MHz system clock supplied by the host computer. Table 19 lists the devices and
Table 18. Signals used to program the switch fabric.
Signal IO DescriptionD[9:0] B Data busA[10:0] I Address busCSB I Chip selectWRB I Write enableRDB I Read enableCONFIG I Config enable
Write Address Read Address
Write Data Valid Read Data
ADDR[10:0]
DATA[9:0]
CSB
WRB
RDB
CONFIG
Figure 59. Timing diagram to program the switch fabric.
84
the related clock signals.
Fig. 60 shows the clock distribution scheme, which involves four clock sources.
Among these, the PCI clock (PCLK) is supplied by the host computer through the
PCI local bus. There are three on-board oscillators providing local timing. The
oscillator provides a system clock to VSC9182 (SYSCLKP/N), from which the on-chip
PLL synthesizes the internal clock with a multiplication ratio of (to be accu-
rate, the calculation is as follows: ). Correspondingly, all
input/output lines (LVDS serial signals) run at (STS-12). A second oscillator
provides transmit clock (TCLK) to the L8104 GbE MAC device. This
input clock is used by the 8B/10B PCS section and generates the transmit output
Table 19. Devices and related clock signals.
Device Function Signal Frequency Sync. toHDMP-1636A SerDes REFCLK 125MHz L8104
L8104 GbE MACTCLK 125MHz HDMP-1636ASCLK 33MHz N/AREGCLK 33MHz N/A
IDT75P52100 TCAM CLK2X 100MHz IDT71V2556SIDT71V2556S SRAM CLK 50MHz IDT75P52100IDT72V36100 FIFO WCLK 50MHz N/A
RCLK 50MHz N/AVSC9182 Switch fabric SYSCLKP/N 78MHz N/AXC2V3000 FPGA GCLK 100MHz N/A
Figure 60. Clock distribution scheme.
100MHz
78MHz
DCM÷2
100M
Hz
50MHz
DCM 50MHz
33MHzL8104SCLK
IDT72V36100
XC2V3000
IDT71V2556A
IDT71V2556A
IDT75P52100VSC9182
GCLKDCM
DCM
50MHz
CLK2X
WCLK/RCLK
T CLK
CLK
CLK
SYSCLKP/N
H1636AREFCLK 125MHz T BC
From PCI bus33MHz
PCLK
REGCLK
125MHz
Del
aylin
e
Delay line
TCAM
SRAM
SRAM
FPGA
Switch fabricFIFO
SerDes
GbE MAC
33MHz
78MHz
622MHz 8
77.76MHz 8 622.08MHz=×
622.08Mbps
125MHz 125MHz
125MHz
85
clock (TBC), which is used to output data on the 10-bit PHY interface. The TBC
is also sent to HDMP-1636A as a reference clock (REFCLK). This clock is internally mul-
tiplied by 10 to generate the high speed clock ( ) necessary for clocking the high
speed serial outputs ( or ).
A third oscillator provides base timing for all other devices. Since the FPGA device,
XC2V3000, is located at the center of the board and all other devices are distributed
around it, the FPGA is a good hub for clock distribution. Besides, the XC2V3000 has
twelve Digital Clock Manager (DCM) modules, which can provide precise clock de-skew,
flexible frequency synthesis, and high-resolution phase shifting. In our clock distribution
scheme, the oscillator is connected to the global clock pin (GCLK) of the FPGA.
Inside the FPGA, by using the GCLK as a reference clock, four DCMs synthe-
size the clock signals for other devices, as shown in Fig. 60.
The FIFO device, IDT72V36100, is driven by a clock signal of . Although the
read clock (RCLK) and the write clock (WCLK) can be independent, both clocks are con-
nected to the same clock source in our design. The TCAM device, IDT75P52100, is
driven by a clock signal of (CLK2X). However, the internal working frequency
is , generated with an internal divider. The CLK2X provides a
Dual Data Rate (DDR) I/O interface. The timing of TCAM device and the attached SRAM
device must be highly synchronous. For this reason, the attached SRAM device,
IDT71V2556A, is driven by the clock that originates from the same DCM as the TCAM
device but is located beyond a divider inside the FPGA. We have a second SRAM
device for outgoing message buffering. A separate DCM is used to supply a clock
to this SRAM device. As we discussed before, the GbE MAC device, L8104, has a clock
125MHz
1250MHz
1250MBd/sec 1Gbps
100MHz
100MHz
50MHz
100MHz
50MHz 1 2⁄ 100MHz
1 2⁄
50MHz
86
input from an oscillator. In addition to transmit clock (TCLK), it has two more
clock inputs, system clock (SCLK) and register-interface clock (REGCLK). These two can
be independent, but in our design, they are connected to the same clock source.
6.4 Power distribution scheme
Power distribution is another important issue for high-speed PCB board design.
Although PCI specification [52] claims that “the maximum power allowed for any PCI
board is 25 watts,” in a typical personal computer system, the maximum power a PCI
board can get from the main board is less than 10 watts. Table 20 lists main devices and
the corresponding power dissipations. Obviously, the power supply from the PCI bus can-
not meet the total power requirement.
We plan to use external power supplies to solve this problem. This is common in prac-
tice. For example, high-end video cards, such as Radeon® 9800XT, due to the power-con-
suming Graphics Processing Unit (GPU), often require external power supply. On our
prototype board, we plan to put two extra power supply adapters, one supplies and
to on-board devices, the other supplies .
Table 20. Power dissipations of main devices.
Device Function Power dissipation
HDMP-1636A SerDesHFBR-53D5 Optical txcvr.L8104 GbE MACIDT75P52100 TCAMIDT71V2556S SRAMIDT72V36100 FIFO 1
1. Data not available
VSC9182 Switch fabric
XC2V3000 FPGA
125MHz
33MHz
900mW
630mW
2.2W
10W
2W
16W<
2.5W≈
1.8V
2.5V 3.3V
87
Chapter 7 Design and Analysis of Signaling Networks
While much attention has been paid to the definition and implementation of signaling
protocols, little attention has been given to the transport mechanisms for sending signaling
messages between circuit switches. The RSVP-TE specification [15] allows for the signal-
ing messages to be carried directly in IP packets, but it does not constrain the links/paths
used to transport these IP packets in any way.
Most vendors have allowed for two options for the signaling “links”: (i) in-band sig-
naling channels, which are realized as bandwidth set aside within user-plane interfaces
between two switches, e.g., the Data Communication Channel (DCC) within each SONET
signal, and (ii) out-of-band signaling channels, e.g., paths through the Internet accessed
via an Ethernet interface from the control (call) processor of the switch. Service providers
are given the flexibility to choose between these two options. In this chapter, we compare
these two options and provide some quantitative guidance to service providers on the
question of which option to use.
Our comparison mainly focuses on the call setup delay, which includes (i) signaling
message processing delays, (ii) round-trip propagation delay, and (iii) signaling message
emission delays. In Chapter 5, we introduced an FPGA-based hardware signaling acceler-
ator, which reduces the signaling message processing overhead to the order of microsec-
onds. The second component, round-trip propagation delay, cannot be reduced due to
speed-of-light constraints. The goal of this work is to reduce the third component, emis-
sion delays, without of course compromising utilization.
88
7.1 Introduction to in-band signaling and out-of-band signaling
Fig. 61 illustrates the two options for signaling transport: in-band signaling and out-
of-band signaling. In the in-band signaling option, the signaling traffic shares the same
channel as the data traffic. For example, the DCC channel in SONET signal can be used to
transport signaling messages, as shown in Fig. 62a. However, the bandwidth of the DCC
channel is limited. For each separate OC-1 signal, the Section DCC (SDCC) has a band-
width of , and the Line DCC (LDCC) has a bandwidth of .
There are two alternatives that can overcome the bandwidth limitation of the DCC
channel. In the first case, a switch interface is partitioned into signaling bandwidth and
user-plane bandwidth. For example, one or more OC-1s can be set aside within each user-
plane interface to exclusively carry signaling messages (e.g. Fig. 62b shows an OC-1 sig-
naling channel within an OC-12 interface). In the second case, one or more OC-1s of one
user-plane interface between two switches, which are connected by interfaces, are set
Figure 61. In-band and out-of-band signaling options.
SW1
SW2 SW3
SW6
SW4 SW5
SW2 SW3
SW4 SW5
SW1 SW6
R1
R2 R3
R6
R4 R5
IP network
(a) In-Band Signaling. (b) Out-of-Band Signaling.
SignalingUser
192Kbps 576Kbps
(a) In-band signaling case 1. (b) In-band signaling case 2. (c) In-band signaling case 3.
OC-1
{OC-12DCC
Figure 62. Three cases of in-band signaling.
N
89
aside for signaling, as shown in Fig. 62c. The disadvantage of these two alternatives is that
there is revenue loss due to the allocation of user-plane bandwidth to signaling.
Fig. 61b illustrates the out-of-band signaling option. The key difference is that the path
taken by the signaling channel can be different from the path taken by the user-plane inter-
faces that it supports. For example, the signaling channel could pass through packet
switches while the user-plane interfaces are direct (logical or physical) between two
switches. In other words, the signaling channel is separate from and independent of the
data channel. A classical example of out-of-band signaling is the SS7 network, which is
used to carry signaling messages between DS0-based telephone circuit switches. SS7 is a
connectionless packet-switched technology.
Unlike in telephone networks, when it was economically feasible to create a dedicated
connectionless packet-switched network just for signaling traffic, in today’s environment,
with the ubiquity of the Internet, it is more likely that GMPLS-enabled SONET/SDH/
WDM circuit switches will leverage the Internet for signaling message transport. A ser-
vice provider could simply connect the control processors of their GMPLS-enabled circuit
switches to the Internet and expect it to route signaling messages as needed. A potential
drawback is latency. We create models for such out-of-band signaling channels routed
through the Internet, and analyze these models to predict performance.
7.2 Delay models
In this section, we set up and analyze models for the two signaling transport options,
in-band signaling and out-of-band signaling. Our goal is to compute the total delay
incurred in processing and sending a signaling message from one switch to the next suc-
cessfully. This consists of the following components: (i) queueing delay plus service time
90
at the signaling processor, (ii) queueing delay plus transmission time on the signaling
channel, (iii) total delay to successfully send the message on the signaling channel, which
includes retransmissions in case of errors. We list our assumptions and notation, and then
describe our queueing model.
7.2.1 Assumptions
We assume that the call arrival process is Poisson, which has been shown to hold true
for FTP applications in [53]. Each call involves multiple signaling messages. For example,
in RSVP-TE, each call involves Path messages, Resv messages, and PathTear/ResvTear
messages. We assume the arrival of signaling messages is also Poisson. Although the mes-
sages involved in a call are related, we argue that since the duration of a call is random, the
arrival times of PathTear/ResvTear messages are independent of those of Path and Resv
messages. Besides, after a switch sends out a Path message, the Path message may go
through a random number of downstream switches before reaching the destination. The
time to the reception of a Resv message following the transmission of a Path message is
thus randomized.
The service times at signaling processors and the transmission times at signaling chan-
nels are both assumed to be deterministic. The processing of different RSVP-TE signaling
messages takes different delays, with Path message consuming the most time. However in
Chapter 5, we introduced a pipelined architecture for hardware-acceleration of RSVP-TE
signaling message processing. In order to achieve full pipelining and the highest through-
put, we purposely inserted dummy cycles when processing Resv, PathTear, and ResvTear
messages so that the processing of all four messages takes the same number of clock
cycles. Even without such an implementation, the difference in processing times is not sig-
91
nificant for different message types. As for the sizes of signaling messages, the Path and
Resv messages are roughly equivalent in size, while the PathTear and ResvTear are
roughly equivalent in size. The approximate size is a few bytes. Therefore we assume
that signaling processing times and signaling transmission times are deterministic.
We assume that all messages fit in one packet, since the maximum transmission unit
size of most existing networks is larger than the size of an RSVP-TE message.
7.2.2 Notation
Our notation is shown in Table 21. In the rest of the paper, the superscripts, OOB and
IB, may be used on some of these parameters as appropriate for out-of-band and in-band,
respectively.
7.2.3 Queueing model
There are two servers related to signaling at a GMPLS-enabled circuit switch: (i) sig-
naling protocol processor, and (ii) signaling channel transmitter. We assume that the
Table 21. Notation.
Symbol Meaning
Aggregate signaling message arrival rate
Service rate of the signaling processor
Service rate of the signaling channel transmitter
Number of GMPLS-enabled neighbors to a switch (also the number of in-band signaling channels)
Probability of sending an outgoing signaling message to the in-band signaling channel;
A random variable denoting the response time at the signaling protocol processor (waiting time plus service time)
A random variable denoting the response time at the signaling channel transmitter (waiting time plus service time)
One-way network delay in sending a signaling message from one switch to the next
Initial time-out value of the retransmission timer at the sender
Probability of packet loss
100
λ
µproc
µtx
n
qj jth
j 1 … n, ,=
Tproc
Ttx
Tn
To
p
92
switch has a single signaling protocol processor irrespective of whether the signaling link
solution is in-band or out-of-band. We first describe our queueing model for the signaling
protocol processor and the signaling channel transmitter without considerations of mes-
sage loss and subsequent re-transmissions. We then improve this model by adding in the
possibility of message loss and re-transmissions.
7.2.3.1 Model without retransmissions
Fig. 63 illustrates our queueing models of these two servers for both in-band and out-
of-band solutions. In the in-band solution, there are signaling channels given our
assumption of GMPLS-enabled neighbors to the switch (see Table 21). In the out-of-
band solution, we assume that there is only one out-of-band signaling channel.
Given that the models shown in Fig. 63 are networks of queues, we consulted the liter-
ature [54] on modeling such queueing networks. Most of the solutions for queueing net-
works, such as Burke’s theorem and Jackson’s theorem, are for a network of M/M/1
queues while in our case, the service times of both queues are deterministic.
Even though there is no analytical derivation for a network of queues with determinis-
tic service times (to our knowledge), it turns out that if we take into account practical con-
siderations, this network of queues reduces to a single M/D/1 queue. Our reasoning is as
follows. The service rate of the signaling processor, , is determined by the switch
λOOBtxµ
(b) Out-of-Band Signaling.
λprocµ
1IBtxµ
2IBtxµ
nIBtxµ
(a) In-Band Signaling, where
procµ
1
1n
ii
q=
=∑
1q
2q
nq
.
Figure 63. Queueing models of the signaling protocol processor and the sig-naling channel transmitters.
n
n
µproc
93
vendor, while the service rate of the signaling channel, , either in-band or out-of-band,
is determined by the service provider. We expect the switch vendor to select so that
the user-plane interfaces and the signaling protocol processor are equally utilized. Once
this decision is made, and a switch is manufactured, is fixed. A service provider
who purchases this switch then has only one degree of freedom to choose an appropriate
bandwidth level for the signaling channels, i.e., the service provider can select .
The service provider should limit , because choosing a larger than
is an unnecessary waste of bandwidth. Therefore we attempt to solve the network of
queues model in Fig. 63 for only two cases, , and . We further
limit the second case to only . Reasoning that a service provider may choose
for costs reasons, we further argue that in this case it is more likely that
because cost differentials for minor drops in transmission rates are often not
significant (e.g., compare the cost of an circuit vs. a circuit). Clearly the
service provider should recognize that in choosing this option instead of the
option, the network is being planned for a lower call arrival rate than the maximum rate
for which the switch is designed.
For these two cases, and , the queueing network model of
Fig. 63 is reduced to a single M/D/1 queue. This is because under our assumption that the
service time at the signaling protocol processor is deterministic, the departure rate is no
more than . If , the second server (signaling channel transmitter) can
keep up with the arriving signaling messages. In other words, the second queue is always
empty, and the delay at the second queue is solely the service time (or emission time). On
the other hand, if , since for the system to be stable, , which
µtx
µproc
µproc
µtx
µtx µproc≤ µtx
µproc
µtx µproc= µtx µproc<
µtx µproc«
µtx µproc<
µtx µproc«
800Mbps 1Gbps
µtx µproc=
µtx µproc= µtx µproc«
µproc µtx µproc=
µtx µproc« λ µtx≤ λ µproc«
94
implies that the first queueing system is lightly loaded. We therefore assume that when a
signaling message arrives, it is served immediately and the first queue is always empty.
For these two cases, and , the queueing network model of
Fig. 63 is reduced to a single queue. The mean response time for a signaling message at a
switch, , is given by:
, if (1)
, if (2)
where is the mean response time at the signaling protocol processor queue, and
is the mean response time at the signaling channel transmitter queue (see Fig. 63).
The above reasoning applies directly to the out-of-band case (Fig. 63b) since in this
case there is only one signaling channel. In the in-band solution, a similar reasoning holds
if the individual signaling channel rates are selected according to the probabilities (see
Fig. 63 and Table 21), i.e., , . Using the argument stated
before, we assume that the service provider will either choose all the in-band signaling
channel rates such that , or choose rates that are much lower,
. Under these conditions, our earlier reasoning of treating the queueing net-
work as a single queue holds.
7.2.3.2 Model including re-transmissions
In this model, we include the third component of the delay, which is , the one-way
network delay (see Table 21). This delay is impacted by losses and re-transmissions. Mes-
sage loss is possible irrespective of whether the signaling channel is in-band or out-of-
band. The reasons for message loss in the in-band case are due to bit- and burst-errors on
µtx µproc= µtx µproc«
E Tsw[ ]
E Tsw[ ] E Tproc[ ] 1µtx
-------+= µtx µproc=
E Tsw[ ] E Ttx[ ] 1µproc
------------+= µtx µproc«
E Tproc[ ]
E Ttx[ ]
qj
µtxIBj
qjµproc= j 1 2 … n, , ,=
µtxIBj
qjµproc=
µtxIBj
qjµproc«
Tn
95
the links, and receive-buffer overflows due to flow control problem. Bit- and burst-link
errors arise due to noise and interference on the physical media. Even though optical fiber,
the physical medium of these high-speed circuit-switched networks, is fairly reliable, link
errors are unavoidable. Receive-buffer overflows will occur if the signaling protocol pro-
cessor at the sending switch is faster than that at the receiving side. Since different switch
vendors could use different implementation techniques for the signaling protocol proces-
sor, this flow control problem may arise. In the out-of-band solution, an additional cause
of message loss is buffer overflow at any of the routers on the path between the sending
and receiving switches.
Since RSVP-TE is specified to use raw IP, and these protocols do not offer a reliable
service, RFC 2961 [40] proposed an exponential back-off retransmission algorithm to pro-
vide reliability.
Assuming that retransmissions will, in typical implementations, join the signaling
channel transmitter queue, and that it is valid to rewrite (1) to be the same as (2) (in other
words for both cases, , and , we use (2)), we re-draw the model
for the second queue (the signaling channel transmitter) as shown in Fig. 64. With proba-
bility , a message is lost, which means a time-out occurs, and the message is re-transmit-
ted. To model exponential doubling, we show the time-out value as a function of the
initial time-out value .
f(T0)
Transmitter (1-p)
p
λ
Figure 64. Signaling channel transmitter modelincluding re-transmissions.
µtx µproc= µtx µproc«
p
f
To
96
The mean response time at the transmitter queue, is a sum of the mean waiting
time, , and the service delay (which is a constant, , given our assumptions).
The mean waiting time, , can be computed assuming that the message arrival rate at
the transmitter queue is given by:
(3)
but for small , this can be approximated to .
Since in RSVP-TE, the signaling protocol processor is also involved in handling re-
transmissions, the same load appears at both the signaling protocol processor and the sig-
naling channel transmitter. Thus, re-writing (1) to be the same as (2) is not affected
because of retransmissions.
Using , , , , and (see Table 21), and following the exponential back-
off re-transmission algorithm, we determine the mean total time to process and send the
outgoing signaling message successfully to the next-hop switch as:
(4)
The explanation for the above equation is as follows. The first two terms, and
, correspond to the processing delay and the time to transmit a message successfully
one-way from the sending switch to the receiving switch. The next term corresponds to the
mean response time (mean waiting time plus mean service time) waiting in the signaling
channel transmitter queue. This delay is incurred once if the first transmission is success-
ful (probability ), twice if the first transmission is unsuccessful but the second
transmission is a success and so on. The last term in (4) corresponds to the time-out value,
E Ttx[ ]
E W[ ] 1 µtx⁄
E W[ ]
λ λp λp2 λp3 …+ + + + λ1 p–( )
----------------=
p λ
µproc Tn Ttx T0 p
E T[ ] 1 µproc⁄ Tn E Ttx[ ] 1 p–( )2p 1 p–( ) 3p2 1 p–( )( ) …
++ +
() T0 p 1 p–( )
3p2 1 p–( ) 7p3 1 p–( ) …+
+ +(
)
+ ++
=
1 µproc⁄
Tn
1 p–( )
97
which is being doubled for every re-transmission.
(5)
Assuming (counting round-trip network delay and other minor delays), (5)
can be re-written as:
(6)
Using the mean response time of an M/D/1 queue for , we have:
(7)
where . The first term, , is common to both in-band signaling and
out-of-band signaling. Therefore, , , and determine which is better, in-band sig-
naling or out-of-band signaling under different sets of conditions. We consider different
numerical values for these input parameters and compare the two signaling transport solu-
tions.
E T[ ] 1µproc
------------ Tn1
1 p–------------E Ttx[ ] p
1 2p–---------------T0+ + +=
T0 3Tn=
E T[ ] 1µproc
------------ 11 p–------------E Ttx[ ] 1 p+
1 2p–---------------Tn+ +=
E Ttx[ ]
E T[ ] 1µproc
------------ 11 p–------------
1 ρ2---–
1 ρ–------------ 1
µtx
-------⋅ 1 p+1 2p–---------------Tn+⋅+=
ρ λ µ⁄ tx= 1 µproc⁄
p µtx Tn
98
7.3 Numerical results and implications
7.3.1 Input parameter values
With hardware signaling accelerator introduced in Chapter 5, RSVP-TE signaling
messages can be processed within several microseconds. Accordingly we assume is
for a hardware signaling accelerator and for a software signaling
processor.
For hardware-accelerated signaling, when , is ; when
, we assume , or . For software signaling, even
when , is or (assuming bytes or bits per
Table 22. Input parameter values.
Symbol Value
Varied from to
/sec, hardware signaling/sec, software signaling
/sec, hardware signaling,
/sec, hardware signaling,
/sec, software signaling
and . For the purpose of comparison, the aggregate bandwidths of in-band
and out-of-band signaling are the same, therefore
Probability of sending an outgoing signaling message to the in-band signaling channel;
in metro area in wide area
and in metro area and in wide area
and
1% and 5%
λ 0.01µtx 0.95µtx
µproc 200 000,
200
µtx µtx 200 000,= µtx µproc=
µtx 40 000,= µtx µproc«
µtx 200=
n 5 10
µtxIBj µtx
OOBn⁄=
qj jth
j 1 … n, ,=
TnIBj 0.2ms
25ms
TnOOB 0.5ms 1ms
35ms 45ms
ToIBj 3Tn
IBj
ToOOB 3Tn
OOB
pIBj 10 8– 10 4–
pOOB
µproc
200 000/sec, 200/sec
µtx µproc= µtx 200 000/sec,
µtx µproc« µtx15---µproc= 40 000/sec,
µtx µproc= µtx 200/sec 200Kbps 125 1000
99
message). Considering the fact that even a single SDCC channel has a bandwidth of
, assigning a value less than is impractical. Therefore we do not
consider for software signaling.
In order to choose reasonable values for and , we refer to the National Laboratory
for Applied Network Research (NLANR) [55] for Round-Trip Time (RTT) and packet
loss rate measurements. We randomly chose North Carolina University (NCSU) as our
starting point. For the data collected on June 14, 2004, we noticed that the RTT between
NCSU and the North Carolina Networking Initiative (NCNI), is . We used the corre-
sponding one-way delay, , for our metro setting. The distance between these two
locations is 20.6 miles. The propagation delay is around . We also notice that in a
similar geographical distance ( miles to be accurate), the RTT between NCSU and
University of North Carolina, Chapel Hill (UNC) is , or for one-way. The differ-
ent RTTs may result from different network conditions. Based on the data obtained from
NLANR, we choose and for in the metro area. We choose for
, since this delay is mainly propagation delay. In the wide area, for example from
NCSU to locations in California, the RTTs range from (California Institute of Tech-
nology, CIT) to (San Diego State University, SDSU). We choose and
for , and for (network delay). The reason we allow for both a
metro-area and a wide-area setting is that some service providers may connect two signal-
ing-enabled switches located across a wide-area with logical links making them neighbors
from a signaling point of view. In the telephone network, for example, many carriers do
interconnect signaling-enabled switches in California directly via logical links (for the
user-plane interfaces) with switches in New York.
192Kbps µtx 200/sec
µtx µproc«
Tn p
1ms
0.5ms
0.16ms
22.6
2ms 1ms
0.5ms 1ms TnOOB 0.2ms
TnIBj
75ms
89.99ms 35ms
45ms TnOOB 25ms Tn
IBj
100
We choose and for . As we explained before, even for in-band signaling,
bit- and burst-link errors, buffer overflows may cause packet loss. For , we choose
and .
In the following subsections, we will show the delays under different scenarios. In
each case, we vary the aggregate signaling message arrival rate, from to ,
to show the delays under different loads.
7.3.2 Hardware-accelerated signaling engine, and
1% 5% pOOB
pIBj
10 4– 10 8–
0.01µtx 0.95µtx
µtx µproc=
Figure 65. In-band/out-of-band signaling with hardwaresignaling, metro area, .µtx µproc=
101
From Fig. 65, we can observe that in the metro area, when hardware-accelerated sig-
naling is used and , in-band signaling always outperforms out-of-band sig-
naling. Referring to (7), the first two terms are relatively small (in the order of ) when
compared to , which is in the order of . Thus, the total delay is mainly determined by
. Since is less than , the in-band solution does better job.
According to (7), when is small, its impact on the total delay is negligible. For in-
band signaling, the values of we use are and (see Table 22). In Fig. 65 and
all following figures, we cannot observe notable difference between in-band signaling
cases for and .
Fig. 66 shows the total delay in the wide area. Here, the total delay is even more dom-
inated by , which is significantly higher than the first two terms of (7), and hence in-
band signaling outperforms out-of-band signaling.
Figure 66. In-band/out-of-band signaling with hardwaresignaling, wide area, .µtx µproc=
µtx µproc=
µs
Tn ms
Tn TnIBj
TnOOB
p
pIBj 10 8– 10 4–
pIBj 10 4–= p
IBj 10 8–=
Tn
102
7.3.3 Hardware-accelerated signaling engine, and
When , for hardware-accelerated signaling, the first term of (7) is still neg-
ligible, but the queueing delay at the transmitter (i.e., the second term of (7)) becomes
comparable to the third term in the metro area setting, where is in the hundreds of .
When is high, the factor starts increasing to the point where the sec-
ond term dominates the total mean delay. In this case the out-of-band signaling outper-
forms in-band signaling because is five times higher in the out-of-band case than in
the in-band case (see Table 22). In the wide area, still dominates and hence the lower
rate of the in-band channel is compensated by the increased of the out-of-band solu-
tion.
µtx µproc«
Figure 67. In-band/out-of-band signaling with hardwaresignaling, metro area, .µtx µproc«
µtx µproc«
Tn µs
ρ 1 ρ 2⁄–( ) 1 ρ–( )⁄
µtx
Tn
Tn
103
7.3.4 Software signaling processor
From Fig. 69, we can observe that out-of-band signaling outperforms in-band signal-
ing in the metro area when software signaling processors are used. This is because the
message processing delay and emission delays (first two terms of (7)) dominate the total
delay.
In the wide area (Fig. 70), the choice between in-band and out-of-band signaling really
depends on the choices of parameters. When and , out-of-band sig-
naling demonstrates the best performance. But we do note that the RTT measurements
obtained from the NLANR site were made primarily on Internet2, which is lightly loaded.
On the Internet, RTTs are likely to be significantly higher. In this case, even with software
signaling processors, the network delay may dominate, making in-band signaling a better
choice.
Figure 68. In-band/out-of-band signaling with hardwaresignaling, wide area, .µtx µproc«
Tn 35ms= p 1%=
104
7.4 Some comments
We compared in-band and out-of-band signaling transport options under assumptions
of the switches having hardware signaling accelerator engines or software signaling pro-
cessors. With hardware signaling accelerator, if the bandwidth allocated for the in-band
signaling channels is in par with the speeded-up processing rates, then the main source of
delay in sending a signaling message is the network delay. Using an out-of-band path
through the public Internet could make this component significant and thus obliterate the
Figure 69. In-band/out-of-band signaling with software sig-naling, metro area.
Figure 70. In-band/out-of-band signaling with softwaresignaling, wide area.
105
gains in call setup delays made with hardware signaling accelerators. In this case, fast in-
band signaling channels are required. In the parameters we considered, this required set-
ting aside one OC1 for each neighbor switch to which there could potentially be a large
number of interfaces, each at a multi-OC1 rate. On the other hand, with software signaling
processors, we found that network delays matter much less, and hence the cheaper out-of-
band public Internet option is sufficient in the metro area. But in the wide area, in-band
signaling is typically a better option.
106
Chapter 8 Applications of Hardware-Accelerated
Signaling
The impact of hardware-accelerated signaling is quite far-reaching. It will fundamen-
tally change our current view of connection-oriented networks. With the call processing
delay per switch decreasing to the order of microseconds, and correspondingly the call
handling rate increasing to the order of calls/sec, connections can be established and
released much faster, switches can handle more requests and maintain more simultaneous
connections. This allows for establishing end-to-end connections even for applications
with short holding times, achieving better QoS while maintaining good resource utiliza-
tion. Furthermore, a new network architecture is suggested, in which multiple distributed
small switches replace a large switch. This new architecture leads to cheaper networks
with better utilization. Another example is the application of hardware-accelerated signal-
ing in network survivability. In this chapter, we briefly discuss these two applications. We
try to shed fresh light on the possible applications of hardware-accelerated signaling. This
is only a starting point. We expect more work in the future on applications that will lever-
age the advantage of connection-oriented networks given this significant technology
advance of hardware-accelerated signaling.
8.1 Utilization analysis of circuit-switched networks for file transfers
8.1.1 Problem statement
In [5], we propose to use end-to-end circuits for large file transfers. In [57], we further
investigated the possibility to apply this idea in Storage Area Networks (SANs). A file
transfer has no intrinsic burstiness, which makes it suitable for circuit-switched networks.
105
107
Once an end-to-end circuit is established, a large file can be transferred at the constant rate
of the circuit. For example, transferring a MPEG4 video file over a circuit
takes around , while it takes only on a circuit.
The higher the call setup delay/transfer delay ratio, the lower the utilization. If pure
software-based signaling is used, the call setup delay per switch is in the order of a few
milliseconds. End-to-end call setup delays along paths traversing multiple switches can
add up to tens to hundreds of milliseconds. This signaling overhead is negligible com-
pared to the transfer time for a file on a circuit but becomes sig-
nificant when the circuit rate is . Furthermore, when the file size is small, the
signaling overhead leads to poor circuit utilization. For example, transferring a MP3
music file over the same connection requires only around , which is signifi-
cant when the call setup delay is in the order of tens to hundreds of milliseconds. There-
fore from a circuit utilization perspective establishing an end-to-end circuit is justifiable
only for those applications with long holding times. For the file transfer application, we
define a crossover file size. Only those files larger than the crossover file size should be
transferred on end-to-end circuits.
Our discussion above focused on only one aspect of utilization, i.e., per-circuit utiliza-
tion. It addresses the issue of whether the circuit, once reserved, is actively in use (trans-
ferring a file) or whether it is simply being held, waiting for the call setup phase to
complete. There is another aspect of utilization, which we call aggregate network utiliza-
tion. It determines whether or not a circuit is occupied. Circuit-switched networks are typ-
ically operated in call-blocking mode. The Erlang-B formula is widely used to model call
blocking probabilities in such networks. With fixed blocking probabilities, the Erlang-B
1GB 1Gbps
8 seconds 800ms 10Gbps
8 seconds 1 GB 1Gbps
10Gbps
5MB
1Gbps 40ms
108
formula shows that the higher the offered traffic load, the higher the utilization, see Fig. 71
(also observe that the higher the blocking probability, the higher the utilization). When
applied to file transfers, the offered load is determined by the crossover file size. From
Fig. 72 we can see that the larger the crossover file size, the lower the effective offered
load.
In order to improve the per-circuit utilization, we need to limit the use of circuits to
large files. This results in low traffic load on the circuits, which, in turn, leads to low
aggregate network utilization. Hardware-accelerated signaling is key to addressing this
impasse. It effectively reduces the call setup overhead, making the per-circuit utilization
acceptable even for a small crossover file size, which, in turn, leads to a higher load and
hence higher aggregate network utilization. In this section, we analyze the impact of hard-
Figure 71. Aggregate network utilization vs. offered traffic load.
1
(1 ) , where
is fixed and may vary
( , )
ba
b
b
Pm
P m
m Erlang P
ρµ
ρ−
− ×=
=
Figure 72. Crossover file size vs. percentage of the offered load.
1000 ,06.1ondistributi Pareto
== kα
109
ware-accelerated signaling on the total network utilization, i.e., the product of both per-
circuit utilization and aggregate network utilization. Table 23 lists the notation we use in
the analysis.
8.1.2 Utilization analysis and the numerical results
Circuit-switched networks, such as telephone networks, typically work in call-block-
ing mode. If resources are available, the call is accepted; if resources are not available, the
call is rejected or blocked. Equation (8) shows the Erlang-B formula used to model the call
blocking probability.
(8)
Table 23. Notation.
Symbol MeaningAggregate network utilization
Per-circuit utilization
Total network utilizationCall blocking probability
Offered traffic loadNumber of circuitsAverage file sizeCircuit rate
Fractional offered loadCross-over file sizeAverage call setup delay
Round-trip propagation delay
µa
µc
µ
Pb
ρ
m
E X[ ]
rc
ρ′
χ
E Tsetup[ ]
Tprop
Offered load ρ (1 )bP ρ− ×
bPρ
×
1
2
mBlocked
Figure 73. A model for a circuit-switched nodeoperated in call-blocking mode.
Pbρm
m!⁄
ρk
k!-----
k 0=
m
∑
---------------=
110
In Fig. 73, the offered load is , and the number of available circuits is
(Erlang is the unit for traffic load, ). Since the blocking
probability is (given in (8)), the effective load is . This effective load is
equally partitioned among circuits. Hence the aggregate network utilization, , is
given by:
(9)
Since we restrict the use of circuits to only files larger than some crossover file size,
, we can compute the fractional offered load and the average file size if
we know the distribution of file sizes. Reference [56] suggests a Pareto distribution for
file sizes. Using this distribution, we compute the fractional offered load as:
(10)
where , the shape parameter, is and , the scale parameter, is 1000 bytes as com-
puted in [56], is the total offered load.
As mentioned earlier, the total network utilization has two components: aggregate net-
work utilization and per-circuit utilization . Per-circuit utilization, , is given by:
(11)
Combining the two components of utilization, we obtain total utilization as:
(12)
We plot the total utilization in Fig. 74 for different , and . As the crossover
file size increases, the plots show utilization increasing because of the second factor,
i.e., the per-circuit utilization increases. However, the drop in the offered load and the cor-
ρ Erlang m
1 Erlang 1 call/sec 1 sec/call×=
Pb 1 Pb–( ) ρ×
m µa
µa
1 Pb–( ) ρ×m
----------------------------=
χ ρ′ E X X χ≥[ ]
ρ′
ρ′ ρE X[ ]-----------P X χ≥( )E X X χ≥[ ] ρ α 1–( )
αk--------------------- k
χ---⎝ ⎠⎛ ⎞
α αχα 1–------------ ρ k
χ---⎝ ⎠⎛ ⎞
α 1–= = =
α 1.06 k
ρ
µa µc µc
µc
E Ttransfer[ ]E Tsetup[ ] E Ttransfer[ ]+----------------------------------------------------------, where E Ttransfer[ ] E X[ ]
rc-----------= =
µ
µ µa µc×1 Pb–( ) ρ′×
m------------------------------=
E X X χ≥[ ] rc⁄E Tsetup[ ] E X X χ≥[ ] rc⁄+----------------------------------------------------------------×=
Pb ρ Tprop
χ
111
responding drop in the aggregate utilization slow the increase of the total utilization, mak-
ing it stable at some value below 1 or even dropping slightly. The overall effect is that the
total utilization reaches a maximum value at some crossover file size level. For a blocking
probability of , in the metro area ( ), this size is less than , while
in the wide area ( ), this size is around . When the blocking proba-
bility is reduced to , both sizes are even smaller.
For the purpose of comparison, we plot the total utilization for pure software-based
signaling in Fig. 75. Because of the higher call setup overhead, the per-circuit utilization is
poor for small crossover file sizes. The total utilization reaches its maximum value at a
larger crossover file size. For a blocking probability of , the size is , for a block-
ing probability of , the size is around . A larger crossover file size means less
files can be transferred through end-to-end circuits (refer to Fig. 72).
0.3 Tprop 0.1ms= 0.5MB
Tprop 50ms= 1.5MB
0.01
Figure 74. Crossover file size vs. total utilization (hardware signaling).
0.3 7MB
0.01 3MB
Figure 75. Crossover file size vs. total utilization (software signaling).
112
From Fig. 74 and 75, we can observe that when the load ( ) increases, the utilization
( ) also increases. For any given utilization, we can calculate the load needed to achieve
this utilization. In Table 24, we list the load needed to achieve a given utilization under
different scenarios. For blocking probability , we choose ; while for
, we choose . We must sacrifice the utilization to achieve low block-
ing probability.
The implication of these numbers is that a large number of interfaces are needed at the
circuit switches leading to the shared interface to create the aggregate load needed to
achieve the desired utilization, as shown in Fig. 76. Since most of the switch costs lie in
the interfaces, it effectively means that hardware-accelerated signaling enables the cre-
ation of cheaper switches. This allows for the creation of connection-oriented networks
with an architecture more like existing data networks rather than today’s telephone net-
works. A typical central office in a telephone network has thousands of lines (interfaces)
while an Ethernet switch in an enterprise department has 24 or 48 interfaces. The lower
per-switch call setup delay enabled by hardware signaling implies that a much larger num-
Table 24. Number of circuits needed for a given utilization.
Hardware-accelerated signaling Pure software signaling
ρ
µ
Pb 0.3= µ 90%=
pb 0.01= µ 80%=
Pb 0.3=
µ 90%=
Tprop 0.1ms= ρ 35= Pb 0.3=
µ 90%=
Tprop 0.1ms= ρ 220=
Tprop 50ms= ρ 42= Tprop 50ms= ρ 480=
Pb 0.01=
µ 80%=
Tprop 0.1ms= ρ 90= Pb 0.01=
µ 80%=
Tprop 0.1ms= ρ 200=
Tprop 50ms= ρ 102= Tprop 50ms= ρ 250=
(a) With hardware-accelerated signaling. (b) With pure software signaling.
Figure 76. Pure software signaling needs more client-sideinterfaces to create the aggregate load toachieve the desired utilization
113
ber of hops are acceptable in such connection-oriented networks. For example, a wide area
traceroute through the Internet shows 15-20 routers while telephone calls seldom pass
through more than 4-5 switches (domestic calls) at which signaling processing is done.
This architecture of smaller size (fewer interfaces) but more hops lends itself to a more
distributed cheaper (from a per-enterprise perspective) connection-oriented network. Thus
the implications of hardware signaling are quite profound.
8.2 Hardware-accelerated signaling and network survivability
Network survivability is defined as the ability of a network to recover from failure(s).
Network survivability is a crucial requirement in high-speed networks. A single optical
fiber cut or core switch failure may cause a significant data loss.
There are two mechanisms to implement network survivability: protection and restora-
tion. Protection requires a pre-assigned secondary path for each connection. When failure
occurs, the protected connection can be switched to the secondary path quickly. Since the
secondary path is pre-assigned, no signaling procedures are involved in the switchover,
which makes it fast. However, protection typically requires resource redundancy.
For example, SONET/SDH Automatic Protection Switch (APS) requires a primary path
and a secondary path for each connection. User traffic is transmitted on both paths simul-
taneously (1+1 protection). Once a failure occurs, the connection can be switched from the
primary path to the secondary path in less than . On the other hand, restoration does
not require a secondary path to be pre-assigned. Instead, the nodes try to re-establish the
affected connection along a secondary path after a failure occurs. The route of the second-
ary path can be pre-calculated or dynamically calculated after the failure. Signaling proce-
dures are involved in restoration. The advantage of restoration is that it does not require a
100%
50ms
114
pre-assigned secondary path, and hence achieves higher resource utilization; however
while the restoration time cannot be guaranteed. Both protection and restoration can be
implemented at the link level or at the path level. In link-level mechanism, the nodes adja-
cent to the failure initiate the recovery procedures, making the recovery procedures local.
In path-level mechanism, the nodes adjacent to the failure notify the end points of this path
upon detecting a failure. Recovery procedures are then initiated by the end hosts and take
place in a wider area.
Network survivability is one of the main applications of GMPLS signaling. Accord-
ingly, the IETF CCAMP working group is active in standardizing/extending GMPLS sig-
naling protocols for network survivability. Several IETF drafts have be proposed, [59]-
[62], defining the terminology, providing a functional description, defining necessary sig-
naling extensions, and analyzing performance. Meanwhile, many technical papers have
been published [63]-[67] on this topic.
In this section, we try to analyze the implications of hardware-accelerated signaling on
network survivability. We expect hardware-accelerated signaling to enable fast restora-
tion, achieving both high utilization and low recovery delay. The notation we use is shown
in Table 25. We only consider single link failures. Although SONET/SDH APS imple-
mentations achieve a recovery delay of , recent research suggests that the
recovery delay is not necessary. In [68], the author concludes that recovery delay
would not jeopardize services (data, video, voice). We also set our recovery delay target to
be .
Table 25. Notation.
Symbol MeaningThe number of nodesThe number of links
50ms 50ms
200ms
200ms
n
l
115
Fig. 77 shows a sample network with a single link failure. The network parameters are
as follows, , , , , and .
8.2.1 Utilization and delay analysis for path restoration
We assume that the traffic between all pairs of nodes is the same (normalized to 1).
There are bi-directional connections in a network with nodes. Since the average
number of hops on the primary path of a connection is , i.e., on average each connection
involves links, if we assume all connections are equally distributed among links, the
average load on a single link, (by definition, it is also the number of connections car-
The average degreeThe average number of hops on the primary path
The average number of hops on the secondary path
The average number of hops on the link restoration ring path
The total load on the primary paths
The recovery load on the secondary paths
The total load on the primary paths after recovery
The average load on a link
The average recovery load on a link
The average propagation delay along a hop
The average delay to send a signaling message from one node to the next
The total recovery delay
Table 25. Notation.
Symbol Meaningd
Hp
Hs
Hr
Lp
Ls
Lp ′
Lp
Ls
Tprop
Tsig
Ttotal
n 13= l 22= Hp 2.4= Hs 3.5= d 3.4=SEA
SFR
LOS
SND
DAL
HOU
LAS ATL
MIA
CHIBOS
NYC
WAS
Figure 77. A sample network with a single link failure.
n2⎝ ⎠⎛ ⎞ n
Hp
Hp l
Lp
116
ried on this link), can be represented as:
(13)
Once a link failure occurs, all affected connections need to be re-established. In path
restoration, the nodes adjacent to the failure notify the end points of all connections
affected by the failure. Once failure notification is received by the end points, two proce-
dures will be incurred, releasing the resources along the primary path and establishing a
connection along the secondary path. These two procedures take place for all affected con-
nections. We assume that a secondary route has been pre-calculated for each connection.
The total recovered load on these secondary paths is given by:
(14)
where is the average number of hops along a secondary path, which is typically larger
than that along the primary path, .
At the same time, the resources allocated for the affected connections on the primary
paths are released, while the resources for other connections are not affected. Accordingly,
the total remaining load on the primary paths is given by:
(15)
Since path restoration takes place globally, we further assume that the restored traffic
is equally distributed among the remaining links (this assumption may be optimis-
tic, it is possible that not all links are involved in the restoration of a single link failure).
The average load on a link, after the restoration, is given by:
(16)
Lp
n2⎝ ⎠⎛ ⎞ Hp×
l--------------------= n n 1–( )Hp
2l-------------------------=
Ls Lp Hs×n n 1–( )HpHs
2l-------------------------------==
Hs
Hp
Lp′ Lp Lp H× p– Lp l Hp–( )×n n 1–( )Hp l Hp–( )
2l---------------------------------------------= = =
l 1–( )
LsLp′ Ls+l 1–
------------------ n n 1–( )Hp l Hs Hp–+( )2l l 1–( )
--------------------------------------------------------= =
117
in (13) is the traffic load on a link under normal condition; in (16) is the traffic
load on a link after restoration, which includes the unaffected normal traffic and restored
traffic. Accordingly, the link utilization can be represented as:
(17)
is mainly determined by the connectivity of a network; the higher the con-
nectivity, the smaller the difference between and . The connectivity of a network is
related to the average degree of the network ( ).
Using the network in Fig. 77 as an example, where , , and ,
we have . In other words, reserving of the total bandwidth on each link for
path restoration can enable the network in Fig. 77 to recover from a single link failure.
In order to analyze the recovery delay, it is necessary to review the sequence of opera-
tions that need to be performed after a link failure. Typically there are three operations:
• Failure detection
• Failure notification
• Recovery
The nodes adjacent to a failure can detect the failure by detecting Loss of Light (LOL),
Loss of Signal (LOS), Signal Degrade (SD), etc. The nodes then generate Notify messages
directed to the end points responsible for restoring the connections. The Notify message
has been added to RSVP-TE for GMPLS to provide a mechanism for failure notification.
Unlike other RSVP messages, the Notify message can be “targeted” to a node other than
the immediate upstream or downstream neighbor. All intermediate nodes just forward the
message, unmodified, towards the target. No processing delays are incurred for Notify
Lp Ls
µp
Lp
Ls----- l 1–
l Hs Hp–( )+------------------------------= =
Hs Hp–( )
Hs Hp
d 2ln
------=
l 22= Hs 3.5= Hp 2.4=
µp 91%= 9%
118
messages at intermediate nodes. Therefore it is reasonable to assume that failure detection
takes no time, and the delay for failure notification is the time to propagate the Notify mes-
sage from the node adjacent to the failure to the end point responsible for restoring the
connection. The average failure notification delay is .
Once the end hosts get the Notify messages, they initiate connection setup on the sec-
ondary paths. The total delay to re-establish a connection includes the processing and
transportation of Path messages in the downstream direction, and the processing and
transportation of Resv messages in the upstream direction. The average recovery delay can
therefore be calculated as . Following the analysis in Chapter 7, we can cal-
culate , the delay to transmit an RSVP-TE signaling message from one node to the
next by using (7). We assume that hardware-accelerated signaling and in-band signaling
transportation option are used. Considering the network in Fig. 77, we further assume that
the RSVP-TE messages are transmitted in the wide area and . From Fig. 68,
we can estimate that . The total delay is given by:
(18)
In Fig. 77, , , and , we have , which
is less than . As a contrast, if pure software signaling implementation is used,
according to the measurements provided by a leading switch vendor1, .
Using the same parameters in Fig. 77, we have , or around 2 seconds.
1 We cannot expose the vendor due to the Non-Disclosure Agreement.
Tprop Hp×2
-----------------------⎝ ⎠⎛ ⎞
Tsig 2Hs×( )
Tsig
µtx µproc«
Tsig 25ms=
TtotalTprop Hp×
2----------------------- Tsig 2Hs×+=
Tprop 5ms= Hp 2.4= Hs 3.5= Ttotal 181ms=
200ms
Tsig 300ms>
Ttotal 2106ms=
119
Chapter 9 Conclusions and Future Work
Envisioning possible new applications of connection-oriented networks, we realize the
importance of improving the call handling capacities of switches. We propose to improve
the call handling capacities by implementing signaling protocols in hardware. As a proof-
of-idea we design and implement OCSP, a signaling protocol specifically designed for
SONET switches, on WILDFORCE FPGA evaluation board. We achieve a call handling
rate of calls/sec and a per-switch call processing delay of . In order to
answer the challenge to implement a widely available signaling protocol in hardware, we
implement a subset of RSVP-TE on a Xilinx Virtex-II FPGA device. This subset is elabo-
rately defined so that most time-critical functions can be handled by hardware. The timing
simulation shows that our implementation, with a fully pipelined architecture, can achieve
a call handling rate of calls/sec and a per-switch call processing delay of .
Based on our implementations, we conclude that hardware-accelerated signaling is
feasible, and demonstrates a 3 orders of magnitude improvement in performance. Hard-
ware-accelerated signaling will fundamentally change our current view of connection-ori-
ented networks. For example, it is widely believed that an end-to-end connection is
justifiable from a utilization perspective only for those applications with large call holding
times. We argue that with hardware-accelerated signaling, we can set up a connection even
for applications with short duration. We use file transfer as an example and define cross-
over file size. If a file is larger than this crossover file size, an end-to-end connection will
be tried. Besides, we propose a new network architecture with smaller-sized switches but
more hops. This architecture lends itself to a more distributed cheaper (from a per-enter-
prise perspective) connection-oriented network. Hardware-accelerated signaling can also
150 000, 6.6µs
250 000, 7.2µs
120
be used for network survivability. Dynamically established secondary path (path restora-
tion scheme) can provide the required reliability with less resource redundancy compared
to protection. And thanks to hardware-accelerated signaling, the recovery delay can
be restricted in .
To complete the analysis of end-to-end setup delays, we compare different signaling
transport options (in-band signaling vs. out-of-band signaling). Based on the numerical
results, we conclude that for hardware-accelerated signaling, in-band signaling option is a
better choice; for pure software signaling, in the metro area out-of-band signaling suffices,
while in the wide area, in-band signaling again is a better choice.
Our current work on the applications and implications of hardware-accelerated signal-
ing is only a starting point. The impact of hardware-accelerated signaling is quite far-
reaching. We will propose new network architectures and applications that can fully
exploit the benefit of hardware-accelerated signaling.
1 1+
200ms
121
References
[1] http://www.internetworldstats.com
[2] http://www.whois.sc/internet-statistics/country-ip-counts.html
[3] http://abilene.internet2.edu/
[4] http://www.internet2.edu/
[5] M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, and W. Feng, “CHEETAH: Circuit-
switched High-speed End-to-End Transport ArcHitecture,” Proc. of Opticomm2003,
October 15-18, Dallas, TX.
[6] T. Russell, Signaling System #7, 2nd edition, McGraw-Hill, New York, 1998.
[7] The ATM Forum Technical Committee, “User Network Interface Specification v3.1,”
af-uni-0010.002, Sept. 1994.
[8] The ATM Forum Technical Committee, “Private Network-Network Specification In-
terface v1.0F (PNNI 1.0),” af-pnni-0055.000, March 1996.
[9] L. Andersson, P. Doolan, N. Feldman, A. Fredette, B. Thomas, “LDP Specification,”
IETF RFC 3036, Jan. 2001.
[10] B. Jamoussi, L. Andersson, R. Callon, R. Dantu, L. Wu, P. Doolan, T. Worster, N.
Feldman, A. Fredette, M. Girish, E. Gray, J. Heinanen, T.Kilty, A. Malis, “Constraint-
Based LSP Setup using LDP,” IETF RFC 3212, Jan. 2002.
[11] D. Awduche, L. Berger, D. Gan, T. Li, et al., “RSVP-TE: Extensions to RSVP for LSP
Tunnels,” IETF RFC 3209, Dec. 2001.
[12] E. Mannie (editor), “Generalized Multi-Protocol Label Switching (GMPLS) Architec-
ture,” IETF RFC 3945, Oct. 2004.
[13] L. Berger, “Generalized Multi-Protocol Label Switching (GMPLS) Signaling Func-
122
tional Description,” IETF RFC 3471, Jan. 2003.
[14] P. Ashwood-Smith, L. Berger, “Generalized Multi-Protocol Label Switching (GM-
PLS) Signaling Constraint-based Routed Label Distribution Protocol (CR-LDP) Ex-
tensions,” IEEE RFC 3472, Jan. 2003.
[15] L. Berger, “Generalized Multi-Protocol Label Switching (GMPLS) Signaling Re-
source ReserVation Protocol-Traffic Engineering (RSVP-TE) Extensions,” IEEE RFC
3473, Jan. 2003.
[16] R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin, “Resource ReSerVation Protocol
(RSVP) - Version 1 Functional Specification,” IETF RFC 2205, Sept. 1997.
[17] K. Kompella (editor), Y. Rekhter, “OSPF Extensions in Support of Generalized Multi-
Protocol Label Switching,” IETF Internet Draft, draft-ietf-ccamp-ospf-gmpls-exten-
sions-12.txt, Oct. 2003.
[18] J. Lang, “Link Management Protocol (LMP),” IETF Internet Draft, draft-ietf-ccamp-
lmp-10.txt, Oct. 2003.
[19] S. K. Long, R. R. Pillai, J. Biswas, T. C. Khong, “Call Performance Studies on the
ATM Forum UNI Signalling,” http://www.krdl.org.sg/Research/Publications/Papers/
pillai_uni_perf.pdf.
[20] M. Veeraraghavan and R. Karri, “Towards enabling a 2-3 orders of magnitude im-
provement in call handling capacities of switches,” NSF proposal 0087487, 2001.
[21] T. Hellstern, “Fast SVC Setup,” ATM Forum Contribution 97-0380, 1997.
[22] L. Roberts, “Inclusion of ABR in Fast Signaling,” ATM Forum Contribution 97-0796,
1997.
[23] P. Pan and H. Schulzrinne, “YESSIR: A Simple Reservation Mechanism for the Inter-
123
net,” IBM Research Report, RC 20967, Sept. 2, 1997.
[24] M. Veeraraghavan, G. L. Choudhury, and M. Kshirsagar, “Implementation and Analy-
sis of Parallel Connection Control (PCC),” Proc. of IEEE Infocom'97, Kobe, Japan,
Apr. 7-11, 1997.
[25] P. Boyer and D. P. Tranchier, “A Reservation Principle with Applications to the ATM
Traffic Control,” Computer Networks and ISDN Systems Journal 24, 1992, pp. 321-
334.
[26] I. Baldine, G. N. Rouskas, H. G. Perros, D. Stevenson, “JumpStart: A Just-in-Time Sig-
naling Architecture for WDM Burst-Switched Networks,” IEEE Communication
Magazine, pages 82-89, Feb. 2002.
[27] D. Gluskin and I. Cidon, “A Hardware Signaling Paradigm for Fine-Grained Resource
Reservation”, Technion, CCIT Publication # 433, June 2003.
[28] M. Benz, “An Architecture and Prototype Implementation for TCP/IP Hardware Sup-
port,” Proc. of TERENA Networking Conference 2001, May 14-17, 2001, Antalya,
Turkey.
[29] 10 Gigabit Ethernet Alliance white paper, “Introduction to TCP/IP Offload Engine
(TOE),” Apr. 2002, http://www.10gea.org/SP0502IntroToTOE_F.pdf.
[30] P. Molinero-Fernandez, N. Mckeown, “TCP Switching: Exposing Circuits to IP,”
IEEE Micro, vol. 22, issue 1, Jan./Feb. 2002, pp. 82-89.
[31] H.Wang, M.Veeraraghavan and R. Karri, “A Hardware Implementation of a Signaling
Protocol, in Proc. of Opticomm 2002,” July 29-Aug. 2, 2002, Boston, MA.
[32] H. Wang, M. Veeraraghavan, T. Li, “Specification of a subset of GMPLS RSVP-TE
for a hardware-accelerated implementation,” http://eeweb1.poly.edu/networks/papers/
124
rsvp-subset.pdf, Nov. 2002.
[33] H. Wang, M. Veeraraghavan, R. Karri, “Detailed description of GMPLS RSVP-TE
signaling procedures for hardware implementation,” http://eeweb1.poly.edu/networks/
papers/procedures /procedures.pdf, Dec. 2002.
[34] H. Wang, R. Karri, M. Veeraraghavan, and T. Li, “Hardware-Accelerated Implemen-
tation of the RSVP-TE Signaling Protocol,” Proc. of IEEE ICC2004, June 20-24,
2004, Paris, France.
[35] M. Waldvogel, G. Varghese, J. Turner, B. Plattner, “Scalable High Speed IP Routing
Lookups,” Proc. ACM SIGCOMM’97, Sept. 14-18, 1997, Palais des Festivals, France.
[36] P. Gupta, S. Lin, N. McKeown, “Routing Lookups in Hardware at Memory Access
Speeds,” IEEE INFOCOM 1998, pp. 1240-1247, March 29-April 2, 1998, San Fran-
cisco, USA.
[37] R. Braden, D. Clark, S. Shenker, “Integrated Services in the Internet Architecture: an
Overview,” IEEE RFC 1633, June 1994.
[38] Eric Mannie, D. Papadimitriou, “Generalized Multi-Protocol Label Switching (GM-
PLS) Extensions for SONET and SDH Control,” IETF RFC 3946, Oct. 2004.
[39] D. Papadimitrious, “Generalized MPLS (GMPLS) Signaling Extensions for G.709
Optical Transport Networks Control,” IETF Internet Draft, draft-ietf-ccamp-gmpls-
g709-08.txt, Sept. 2004.
[40] L. Berger, D. Gan, et al., “RSVP Refresh Overhead Reduction Extensions,” IETF RFC
2961, April 2001.
[41] Xilinx datasheet, “Virtex-II 1.5V Field-Programmable Gate Arrays, Advanced Prod-
uct Specification,” Oct. 2001, http://direct.xilinx.com/bvdocs/publications/ds031.pdf.
125
[42] Xilinx user guide, “Virtex-II Platform FPGA User Guide,” Nov. 2002, http://direct.xil-
inx.com/bvdocs/userguides/ug002.pdf.
[43] Xilinx application note, “Two Flows for Partial Reconfiguration: Module Based or
Difference Based,” Nov. 2003. http://www.xilinx.com/bvdocs/appnotes/xapp290.pdf.
[44] http://www.sycamorenet.com
[45] Agilent Technologies datasheet, “1 x 9 Fiber Optic Transceivers for Gigabit Ethernet,
HFBR-53D5 Family,” Nov. 1999.
[46] Agilent Technologies datasheet, “Gigabit Ethernet and Fibre Channel SerDes ICs,
HDMP-1636A Transceiver,” April 2002.
[47] LSI Logic datasheet, “8101/8104 Gigabit Ethernet Controller,” Nov. 2001.
[48] IDT datasheet, “3.3 Volt High-Desity SuperSync™ II 36-bit FIFO - 64K x 36,
IDT72V36100,” Feb. 2003.
[49] IDT datasheet, “Network Search Engine - 64K x 72 Entries, IDT75P52100,” Feb.
2002.
[50] IDT datasheet, “3.3V Synchronous ZBT™ SRAMs, 2.5V I/O, Burst Counter, Pipe-
lined Outputs - 128K x 36, IDT71V2556S,” May 2002.
[51] Vitesse datasheet, “64x64 STS-12/STM-4 TSI Switch Fabric, VSC9182,” Dec. 2001.
[52] PCI Speical Interest Group, “PCI Local Bus Specification, Product Version, Revision
2.1,” June 1, 1995.
[53] M. E. Crovella and A. Bestavros, “Self-similarity in World Wide Web Traffic Evi-
dence and Possible Causes,” Proc. of the SPIE International Conference on Perfor-
mance and Control of Network Systems, Nov., 1997.
[54] D. Gross, C. M. Harris, Fundamentals of Queueing Theory, 3rd edition, Wiley-Inter-
126
science, 1998.
[55] http://www.nlanr.net.
[56] V. Paxon and S. Floyd, “Wide-Area Traffic: The Failure of Poisson Modeling,” IEEE/
ACM Transactions on Networking, Vol.3, June 1995.
[57] H.Wang, M.Veeraraghavan and R. Karri, “A Dynamic Circuits Based Wide-Area SAN
Solution,” the 1st International Workshop on Optical Networking Technologies for
Global SAN solutions, Oct. 13, 2002, Dallas, TX.
[58] Alberto Leon-Garcia, Indra Widjaja, Communication Networks - Fundamental Con-
cepts and Key Architectures, McGraw-Hill Higher Education.
[59] E. Mannie, D. Papadimitriou, “Recovery (Protection and Restoration) Terminology
for Generalized Multi-Protocol Label Switching (GMPLS),” IETF Internet Draft,
draft-ietf-ccamp-gmpls-recovery-terminology-05.txt, Oct. 2004.
[60] D. Papadimitriou, Eric Mannie, “Analysis of Generalized Multi-Protocol Label
Switching (GMPLS)-based Recovery Mechanisms (including Protection and Restora-
tion),” IETF Internet Draft, draft-ietf-ccamp-gmpls-recovery-analysis-04.txt, Oct.
2004.
[61] J. P. Lang, B. Rajagopalan, “Generalized Multi-Protocol Label Switching (GMPLS)
Recovery Functional Specification,” IETF Internet Draft, draft-ietf-ccamp-gmpls-re-
covery-functional-03.txt, Oct. 2004.
[62] J.P. Lang, Y. Rekhter, D. Papadimitriou, “RSVP-TE Extensions in support of End-to-
End GMPLS-based Recovery,” IETF Internet Draft, draft-ietf-ccamp-gmpls-recovery-
e2e-signaling-01.txt, May 2004.
[63] J. Wang, L. Sahasrabuddhe, B. Mukherjee, “Path vs. subpath vs. link restoration for
127
fault management in IP-over-WDM networks: performance comparisons using GM-
PLS control signaling,” IEEE Communication Magazine, Vol.40, issue 11, pages:80-
87, Nov. 2002.
[64] A. Banerjee, J. Drake, J. Lang, “Generalized Multiprotocol Label Switching: An
Overview of Signaling Enhancements and Recovery Techniques,” IEEE Communica-
tion Magazine, Jan. 2001.
[65] M. Alicherry, H. Nagesh, C. Phadke, and V. Poosala, “FASTeR: sub-50-ms shared res-
toration in mesh networks,” Journal of Optical Networking, Vol.2, No.11, Nov. 2003.
[66] G. V. Kaigala, W. D. Grover, “On the efficacy of GMPLS auto-reprovisioning as a
mesh-network restoration mechanism,” Proceedings of IEEE GLOBECOM 2003,
Vol.7, Dec. 2003.
[67] G. Li, J. Yates, R. Doverspike, and D. Wang, “Experiments in fast restoration using
GMPLS in optical/electronic mesh networks,” Proceedings of OFC 2001, Vol.4, pp.
PD34-(1-3), March 2001.
[68] Jeff Schallenberg, “Is 50 ms Restoration Necessary?” IEEE Bandwidth Management
Workshop IX, June 25-28, 2001, Montebello, Canada.