dissertation - University of Virginia School of ...mv/PhDthesis/haobo/dissertation.pdf ·...

Hardware-Accelerated Signaling: Design,

Implementation and Implications

DISSERTATION

for the Degree of

DOCTOR OF PHILOSOPHY

(Electrical Engineering)

HAOBO WANG

November 2004

Hardware-Accelerated Signaling: Design,

Implementation and Implications

DISSERTATION

Submitted in Partial Fulfillment

of the REQUIREMENTS for the

Degree of

DOCTOR OF PHILOSOPHY (Electrical Engineering)

at the

POLYTECHNIC UNIVERSITY

by

Haobo Wang

November 2004

Approved:

_____________________________

Department Head

Copy No. ___ _________, 20__________Date

Approved by the Guidance Committee:

Major: Electrical Engineering

______________________Malathi VeeraraghavanAssociate Professor ofElectrical Engineering______________________Date

______________________Ramesh KarriAssociate Professor ofElectrical Engineering______________________Date

______________________Shivendra PanwarProfessor ofElectrical Engineering______________________Date

Minor: Computer Science

______________________Haldun Hadimioglu Associate Professor of Computer Science ______________________Date

Microfilm or other copies of this dissertation are obtainable from

UMI Dissertations Publishing

Bell & Howell Information and Learning

300 North Zeeb Road

P. O. Box 1346

Ann Arbor, Michigan 48106-1346

VITAMr. Haobo Wang received his B.Sc. and M.Sc. degrees in Electrical Engineering from

the University of Electronic Science and Technology of China, Chengdu, China, in 1997

and 2000, respectively. From April 2000 to August 2000, he worked for Lucent Technolo-

gies, China, as a hardware (ASIC) design engineer.

Since September 2000, Mr. Wang has conducted research work towards his Ph.D.

degree in Electrical Engineering at Polytechnic University, where he has been a research

fellow of the Center of Advanced Technology in Telecommunications (CATT) at Poly-

technic University. The work presented in this dissertation was undertaken between Janu-

ary 2001 and November 2004 under the supervision of Professor Malathi Veeraraghavan

and Professor Ramesh Karri.

Mr. Wang is a student member of the Institute of Electrical and Electronics Engineers

(IEEE), a student member of the IEEE Communications Society (ComSoc), and a student

member of the International Society for Optical Engineering (SPIE).

i

ACKNOWLEDGMENTSI would like to express my deepest gratitude to the numerous people who have been

instrumental in the completion of this dissertation. First and foremost, I want to thank my

advisors Professor Malathi Veeraraghavan and Professor Ramesh Karri for their guidance

and encouragement throughout the course of my Ph.D. program.

Professor Veeraraghavan tirelessly guided me in every step and aspect of my research

work, from the selection of my research topic to the final preparation of my dissertation

defense. Apart from the immense assistance towards my research work, she helped me

shape my personality as a researcher by helping me with the most important fundamental

aspects, from better English writing skills to performing complex theoretical analyses. She

has been the single biggest motivating factor during these important years of my life. I

specially like to thank Professor Karri for having devoted his time and energy towards my

everyday research work during the last two years.

I would also like to thank the other members of my dissertation guidance committee,

Professor Shivendra S. Panwar and Professor Haldun Hadimioglu, for their thoughtful and

insightful advice. Professor Panwar gave me valuable comments on the applications of

hardware-accelerated signaling, which greatly enriched my dissertation. Professor Hadi-

mioglu took the time to proof-read the dissertation and help iron out many mistakes, both

technical and grammatical.

Many thanks go to my colleagues in the Wireless Networks lab and the CAD Research

lab. It has always been enjoyable to discuss technical and non-technical issues with them.

I especially thank Tao Li, Dr. Xuan Zheng, and Zhifeng Tao with whom I spent my first

two years at Poly. I also thank Dr. Kaijie Wu, Bo Yang, Dr. Piyush Mishra, Nikhil Joshi,

ii

and Tongquan Wei with whom I have had a great time during the past two years. These

wonderful times with them have resulted in great friendship both at a technical and per-

sonal level. The journey through these past four years of graduate life at Poly has been one

of the best experiences of my life.

I owe a huge debt of gratitude to my family: my parents and my sister Minmei, for

their dedicated and endless love and support not only throughout the course of my Ph.D.

studies, but my entire life. Without them, I could not have achieved anything that I have

achieved today.

Finally I thank National Science Foundation (NSF) and the Center of Advanced Tech-

nology in Telecommunications (CATT) at Polytechnic University for providing me with

the required financial support for my Ph.D. research.

iii

Abbreviations and AcronymsAPI Application Programming Interface

APS Automatic Protection Switch

ASIC Application-Specific Integrated Circuit

ATM Asynchronous Transfer Mode

BGP Border Gateway Protocol

BNF Backus-Naur Form

CAC Connection Admission Control

CAM Content Addressable Memory

CCAMP WG Common Control and Measurement Plane Working Group

CR-LDP Constraint-based Routed Label Distribution Protocol

DCC Data Communication Channel

DDR Dual Data Rate

DSP Digital Signal Processing

DWDM Dense Wavelength Division Multiplexing

FIFO First-In First-Out

FPGA Field Programmable Gate Array

FTP File Transfer Protocol

GbE Gigabit Ethernet

GMPLS Generalized Multi-Protocol Label Switching

GPU Graphics Processing Unit

HBA Host Bus Adapter

IANA Internet Assigned Numbers Authority

IETF Internet Engineering Task Force

IntServ Integrated Services

IP Internet Protocol

LDCC Line Data Communication Channel

LDP Label Distribution Protocol

LMP Link Management Protocol

LOL Loss of Light

LOS Loss of Signal

LPM Longest Prefix Match

LSB Least Significant Bit

LSP Label Switched Path

LSR Label Switching Router

iv

LVDS Low Voltage Differential Signaling

MAC Media Access Control

MPLS Multi-Protocol Label Switching

MSB Most Significant Bit

MTP Message Transfer Part

NIC Network Interface Card

NSE Network Search Engine

OC Optical Carrier

OCSP Optical Circuit-switched Signaling Protocol

OIF Optical Internetworking Forum

OSPF-TE Open Shortest Path First-Traffic Engineering

OTN Optical Transport Network

PCB Printed Circuit Board

PCI Peripheral Component Interface

PCS Physical Coding Sublayer

PNNI Private Network to Network Interface

QoS Quality of Service

RAM Random Access Memory

RFC Request for Comment

RSVP Resource ReSerVation Protocol

RSVP-TE Resource ReSerVation Protocol-Traffic Engineering

RTT Round-Trip Time

SAN Storage Area Network

SD Signal Degrade

SDCC Section Data Communication Channel

SDH Synchronous Digital Hierarchy

SerDes Serializer/Deserializer

SONET Synchronous Optical NETwork

SRAM Static Random Access Memory

SS7 Signaling System 7

STS Synchronous Transport Signal

TCAM Ternary Content Addressable Memory

TCP Transport Control Protocol

TLV Type-Length-Field

TOE TCP Offload Engine

TTL Time-to-Live

v

UDP User Datagram Protocol

UNI User Network Interface

VHDL VHSIC Hardware Description Language

VHSIC Very High Speed Integrated Circuits

VLSI Very Large Scale Integration

WDM Wavelength Division Multiplexing

vi

AN ABSTRACTHardware-Accelerated Signaling: Design, Implementation and Implications

by

Haobo Wang

Advisor: Malathi Veeraraghavan

Submitted in Partial Fulfillment of the Requirements for the

Degree of Doctor of Philosophy (Electrical Engineering)

November 2004

Despite the dominance of connectionless IP networks, i.e., the Internet, connection-

oriented networks are gaining attention because of their inherent support for Quality of

Service (QoS). Even IP routers are being enhanced with connection-oriented features. Sig-

naling protocols are used in connection-oriented networks to set up and tear down connec-

tions.

Signaling protocols are primarily implemented in software for two reasons, complex-

ity and the requirement for flexibility. Although these are two good reasons for software

implementations, the price paid is in performance. Software implementations of signaling

protocols are rarely capable of handling over calls/sec. Correspondingly, per-switch

call processing delays are in the order of milliseconds.

We propose to implement signaling protocols in hardware, expecting a 2-3 orders of

magnitude improvement in the call handling capacities of switches. As a first step, we

define Optical Circuit-switched Signaling Protocol (OCSP), a performance-oriented sig-

1000

vii

naling protocol designed for SONET switches. We implement OCSP on WILDFORCE

FPGA evaluation board. The simulation results show a call handling rate of

calls/sec and a per-switch call processing delay of .

We then choose RSVP-TE for hardware acceleration. As a signaling protocol targeting

almost all connection-oriented networks, RSVP-TE for GMPLS is complex and flexible,

and not intended for hardware implementation. However, it is not only impractical but

unnecessary to implement the complete signaling protocol in hardware. Instead, we

extract a subset of RSVP-TE for hardware-acceleration and relegate the functionality

beyond this subset to software. The subset is large enough to cover time-critical operations

while small enough to make hardware implementation feasible. We implement the subset

on a Xilinx Virtex-II FPGA device, which we call hardware signaling accelerator. With a

fully pipelined architecture and innovative design techniques, a call handling rate of

calls/sec is achieved, and per-switch call processing delay is .

In order to demonstrate a complete switching system equipped with the hardware sig-

naling accelerator, we design a PCI-based prototype board including both user-plane and

control-plane devices. The user-plane carries the user traffic while the control-plane con-

trols the operation of the user-plane. The hardware signaling accelerator is the core com-

ponent on the control-plane.

The total call setup delay consists of three components: round-trip propagation delay,

call processing delays, and signaling message emission delays. The round-trip propaga-

tion delay is limited by the speed-of-light. With hardware-accelerated signaling, the call

processing delays can be reduced to the order of microseconds. The last component is

determined by the choice of signaling transport options: in-band signaling or out-of-band

150 000,

6.6µs

250 000, 7.2µs

viii

signaling. We compare the total call setup delay for both options under different scenarios.

The impact of hardware-accelerated signaling is quite far-reaching. It will fundamen-

tally change our current view of connection-oriented networks. With the per-switch call

processing delay decreasing to the order of microseconds, and the call handling rate

increasing to the order of calls/sec, connections can be established and released much

faster, switches can handle more requests and maintain more simultaneous connections.

New architectures, and applications that can better exploit the hardware-accelerated sig-

naling are proposed and analyzed.

105

ix

ContentsAcknowledgment................................................................................................................. iAbbreviations and Acronyms ............................................................................................ iiiAbstract.............................................................................................................................. vi

1 Introduction 1

1.1 Background ........................................................................................................ 11.2 Problem statement ............................................................................................. 61.3 Related work ...................................................................................................... 71.4 Outline ............................................................................................................... 9

2 Optical Circuit-switched Signaling Protocol (OCSP): Specification and Imple-mentation 11

2.1 OCSP - a signaling protocol for performance ................................................. 112.1.1 Signaling messages ............................................................................... 122.1.2 State transition diagram of a connection ............................................... 132.1.3 Data tables ............................................................................................. 14

2.2 Implementation of OCSP on WILDFORCE platform .................................... 152.2.1 Parameters ............................................................................................. 162.2.2 State transition diagram of the hardware signaling accelerator ............ 172.2.3 Managing the available time-slots in hardware .................................... 182.2.4 Managing the connection references in hardware ................................ 192.2.5 Implementation and simulation results ................................................. 20

3 A Subset of RSVP-TE for Hardware-Acceleration 23

3.1 Review of RSVP-TE for GMPLS networks .................................................... 243.2 A Subset of RSVP-TE for GMPLS networks ................................................. 25

3.2.1 RSVP messages .................................................................................... 253.2.1.1 RSVP common header ............................................................. 253.2.1.2 RSVP messages ....................................................................... 26

3.2.2 RSVP objects ........................................................................................ 293.2.2.1 FILTER_SPEC/SENDER_TEMPLATE object ...................... 303.2.2.2 FLOWSPEC/SENDER_TSPEC object ................................... 303.2.2.3 Generalized LABEL object ..................................................... 313.2.2.4 Generalized LABEL_REQUEST object ................................. 323.2.2.5 RSVP_HOP object .................................................................. 333.2.2.6 SESSION object ...................................................................... 34

3.2.3 Optional objects support ....................................................................... 343.2.3.1 SUGGESTED_LABEL object ................................................ 353.2.3.2 MESSAGE_ID/MESSAGE_ID_ACK object ......................... 36

3.2.4 Connection setup and teardown with RSVP-TE signaling messages ... 37

x

4 RSVP-TE Procedures for Hardware-Acceleration 38

4.1 Registers .......................................................................................................... 384.2 Data tables ....................................................................................................... 40

4.2.1 Routing table ......................................................................................... 414.2.2 Incoming Connectivity table ................................................................. 414.2.3 Outgoing Connectivity table ................................................................. 424.2.4 Outgoing CAC table ............................................................................. 434.2.5 User/Control Mapping table ................................................................. 444.2.6 State table .............................................................................................. 45

4.3 Procedures ........................................................................................................ 45

5 An FPGA-Based Hardware Signaling Accelerator 50

5.1 The architecture of the hardware signaling accelerator ................................... 515.2 Functional modules .......................................................................................... 51

5.2.1 Register bank ........................................................................................ 515.2.2 Input message buffering system ........................................................... 535.2.3 Two-level object dispatching ................................................................ 555.2.4 Data table management ......................................................................... 565.2.5 Resource (bandwidth) management ...................................................... 585.2.6 Re-transmission management ............................................................... 60

5.3 Implementation and simulation results ............................................................ 635.4 Re-configurability ............................................................................................ 65

6 Design of a Prototype Switching System 67

6.1 Architecture of the PCI-Based prototype switching system ............................ 686.2 Main on-board devices .................................................................................... 70

6.2.1 1 Gbps signaling channel ...................................................................... 706.2.2 Incoming message buffer - FIFO device .............................................. 716.2.3 Data tables - the TCAM and SRAM devices ........................................ 74

6.2.3.1 The TCAM and SRAM devices .............................................. 746.2.3.2 The organization of the data tables .......................................... 78

6.2.4 User-plane device - VSC9182 .............................................................. 816.3 Clock distribution networks ............................................................................. 836.4 Power distribution scheme ............................................................................... 86

7 Design and Analysis of Signaling Networks 87

7.1 Introduction to in-band signaling and out-of-band signaling .......................... 887.2 Delay models ................................................................................................... 89

7.2.1 Assumptions .......................................................................................... 907.2.2 Notation ................................................................................................ 917.2.3 Queueing model .................................................................................... 91

xi

7.2.3.1 Model without retransmissions ................................................ 927.2.3.2 Model including re-transmissions ........................................... 94

7.3 Numerical results and implications ................................................................. 987.3.1 Input parameter values .......................................................................... 987.3.2 Hardware-accelerated signaling engine, and ................ 1007.3.3 Hardware-accelerated signaling engine, and ................... 1027.3.4 Software signaling processor .............................................................. 103

7.4 Some comments ............................................................................................. 104

8 Applications of Hardware-Accelerated Signaling 106

8.1 Utilization analysis of circuit-switched networks for file transfers ............... 1068.1.1 Problem statement ............................................................................... 1068.1.2 Utilization analysis and the numerical results .................................... 109

8.2 Hardware-accelerated signaling and network survivability .......................... 1138.2.1 Utilization and delay analysis for path restoration ............................. 115

9 Conclusions and Future Work 119

µtx µproc=µtx µproc«

xii

List of FiguresFigure 1. Illustration of connection setup. ....................................................................... 3Figure 2. Unfolded view of a switch in connection-oriented networks. .......................... 4Figure 3. GMPLS protocol stack diagram. ...................................................................... 4Figure 4. An example of connection-oriented networks. ................................................. 5Figure 5. OCSP signaling messages. ............................................................................. 12Figure 6. State transition diagram of a connection. ....................................................... 13Figure 7. Data tables used by the signaling protocol. .................................................... 14Figure 8. Architecture of WILDFORCETM board. ...................................................... 15Figure 9. State transition diagram of the hardware signaling accelerator. ..................... 17Figure 10. Time-slot manager. ......................................................................................... 18Figure 11. Connection reference manager. ...................................................................... 19Figure 12. Timing simulation for Setup message. ........................................................... 21Figure 13. RSVP message common header. .................................................................... 25Figure 14. RSVP object. .................................................................................................. 29Figure 15. TLV structure of the SESSION object. .......................................................... 30Figure 16. FILTER_SPEC/SENDER_TEMPLATE object. ............................................ 30Figure 17. FLOWSPEC/SENDER_TSPEC object. ......................................................... 31Figure 18. LABEL object. ............................................................................................... 32Figure 19. Format of a SONET/SDH label. ..................................................................... 32Figure 20. SONET multiplexing hierarchy and generalized label. .................................. 32Figure 21. LABEL_REQUEST object. ........................................................................... 33Figure 22. RSVP_HOP object. ........................................................................................ 33Figure 23. SESSION object. ............................................................................................ 34Figure 24. MESSAGE_ID/MESSAGE_ID_ACK object. ............................................... 36Figure 25. Illustration of connection setup with RSVP-TE messages. ............................ 37Figure 26. Network used for illustrative examples. ......................................................... 40Figure 27. Example of Avail_BW. .................................................................................. 44Figure 28. Processing of Common header. ...................................................................... 46Figure 29. Processing of Path message. ........................................................................... 46Figure 30. Processing of SESSION object. ..................................................................... 48Figure 31. At the end of Path message processing. ......................................................... 49Figure 32. Architecture of the hardware signaling accelerator. ....................................... 51Figure 33. Dynamic round-robin style pipelining. ........................................................... 52Figure 34. Two-level of input message buffering. ........................................................... 53Figure 35. Buffer management module. .......................................................................... 54Figure 36. TLV style object processing. .......................................................................... 55Figure 37. Dependencies among the data tables. ............................................................. 57Figure 38. Priority arbitrator for TCAM. ......................................................................... 58

xiii

Figure 39. Resource management module. ...................................................................... 59Figure 40. Comparison of resource allocation schemes. ................................................. 60Figure 41. Re-transmission management (buffers and timers). ....................................... 61Figure 42. Processing of Path message. ........................................................................... 64Figure 43. Processing of Resv message. .......................................................................... 64Figure 44. Processing of PathTear message. ................................................................... 65Figure 45. Architecture of the prototype board. .............................................................. 68Figure 46. 1Gbps signaling channel. ................................................................................ 70Figure 47. Block diagram of L8104. ................................................................................ 70Figure 48. Block diagram of IDT72V36100. ................................................................... 72Figure 49. RAM and CAM. ............................................................................................. 74Figure 50. How does TCAM work. ................................................................................. 76Figure 51. FPGA/TCAM/SRAM configuration. ............................................................. 76Figure 52. TCAM command bus. .................................................................................... 77Figure 53. 72-Bit look-up timing diagram. ...................................................................... 78Figure 54. Configurations of the five data tables in the TCAM device. .......................... 79Figure 55. Use matched address as interface ID. ............................................................. 80Figure 56. Organization of the data tables. ...................................................................... 81Figure 57. Block diagram of VSC9182. .......................................................................... 82Figure 58. Address bus and data bus. .............................................................................. 82Figure 59. Timing diagram to program the switch fabric. ............................................... 83Figure 60. Clock distribution scheme. ............................................................................. 84Figure 61. In-band and out-of-band signaling options. .................................................... 88Figure 62. Three cases of in-band signaling. ................................................................... 88Figure 63. Queueing models of the signaling protocol processor and the signaling channel

transmitters. .................................................................................................... 92Figure 64. Signaling channel transmitter model including re-transmissions. .................. 95Figure 65. In-band/out-of-band signaling with hardware signaling, metro area,

. ................................................................................................ 100Figure 66. In-band/out-of-band signaling with hardware signaling, wide area,

. ................................................................................................ 101Figure 67. In-band/out-of-band signaling with hardware signaling, metro area,

. .................................................................................................. 102Figure 68. In-band/out-of-band signaling with hardware signaling, wide area,

. ................................................................................................ 103Figure 69. In-band/out-of-band signaling with software signaling, metro area. ........... 104Figure 70. In-band/out-of-band signaling with software signaling, wide area. ............. 104Figure 71. Aggregate network utilization vs. offered traffic load. ................................ 108Figure 72. Crossover file size vs. percentage of the offered load. ................................. 108Figure 73. A model for a circuit-switched node operated in call-blocking mode. ........ 109

µtx µproc=

µtx µproc=

µtx µ« proc

µtx µproc«

xiv

Figure 74. Crossover file size vs. total utilization (hardware signaling). ...................... 111Figure 75. Crossover file size vs. total utilization (software signaling). ....................... 111Figure 76. Pure software signaling needs more client-side interfaces to create the aggregate

load to achieve the desired utilization .......................................................... 112Figure 77. A sample network with a single link failure. ................................................ 115

xv

List of TablesTable 1. Clock cycles consumed by various OCSP messages. ..................................... 22Table 2. Data registers. .................................................................................................. 38Table 3. Configuration registers. ................................................................................... 40Table 4. Routing table. .................................................................................................. 41Table 5. Incoming Connectivity table. .......................................................................... 41Table 6. Incoming Connectivity table at switch II in Fig. 26........................................ 42Table 7. Outgoing Connectivity table............................................................................ 42Table 8. Outgoing Connectivity table for switch I in Fig. 26........................................ 43Table 9. Outgoing CAC table. ....................................................................................... 43Table 10. Outgoing CAC table for switch I in Fig. 26. ................................................... 43Table 11. User/Control Mapping table. ........................................................................... 44Table 12. State table. ....................................................................................................... 45Table 13. Implementation results. ................................................................................... 63Table 14. Clock Cycles to receive an RSVP-TE message. ............................................. 63Table 15. L8104 system interface.................................................................................... 71Table 16. Main interface signals of IDT72V36100......................................................... 72Table 17. Typical TCAM and SRAM signals. ................................................................ 78Table 18. Signals used to program the switch fabric....................................................... 83Table 19. Devices and related clock signals.................................................................... 84Table 20. Poser dissipations of main devices. ................................................................. 86Table 21. Notation. .......................................................................................................... 91Table 22. Input parameter values. ................................................................................... 98Table 23. Notation. ........................................................................................................ 109Table 24. Number of circuits needed for a given utilization. ........................................ 112Table 25. Notation. ........................................................................................................ 114

1

Chapter 1 Introduction

1.1 Background

Since the premiere of ARPAnet, the precursor to today’s Internet, in 1969, the reach of

connectionless IP networks has been growing. Nowadays the Internet is so successful that

more than million users around the world use the Internet regularly [1]. The Internet

Assigned Numbers Authority (IANA) has allocated more than billion IP addresses to

countries and regions [2], which account for more than half of total IPv4

addresses. The history of connection-oriented networks dates back even further to 1879,

when the then Bell Telephone Company set up the first telephone network in the world.

Now different types of connection-oriented networks, such as Asynchronous Transfer

Mode (ATM) networks, Synchronous Optical NETworks (SONET), Synchronous Digital

Hierarchy (SDH) networks, and Dense Wavelength Division Multiplexed (DWDM) net-

works, are deployed widely. However, these connection-oriented networks are primarily

used as carriers for IP traffic. For example, Abilene [3], the backbone of Internet2® [4],

leases OC-192c/OC-48c SONET circuits from Qwest Communications to inter-connect

backbone routers. Internet2 provides high-speed IP service to end users while the aggre-

gate user traffic is carried on the SONET circuits.

Two issues typically associated with connection-oriented networks are as follows.

First, in connection-oriented networks, a connection must be established before data trans-

fer can take place. This process is called connection (call) setup, which involves the

exchange of signaling messages between end hosts and the associated per-hop resource

reservation. The connection (call) setup overhead becomes significant if the holding time

of the connection is short, which is typical for today’s end user traffic. Second, once a con-

800

2

229 232

2

nection is established, the end-to-end connection is dedicated to the two end systems on

that connection. There is no resource-sharing, which may lead to poor utilization if the

user traffic is bursty.

However, connection-oriented networks have been gaining more attention recently

because of their inherent support for Quality of Service (QoS). Standardization organiza-

tions, such as Optical Internetworking Forum (OIF) and Internet Engineering Task Force

(IETF) Common Control and Measurement Plane (CCAMP) working group are creating

protocols, standards for connection-oriented networks, both packet-switched and circuit-

switched. New network architectures and applications are proposed to better utilize con-

nection-oriented networks. For example, in [5], the authors note that file transfers have no

intrinsic burstiness, and hence identify file transfer as an application that can fully utilize

connections.

Signaling protocols are used in connection-oriented networks to set up and tear down

connections. Examples of signaling protocols include the Signaling System 7 (SS7) in

telephone networks [6], the User Network Interface (UNI) [7] and Private Network to Net-

work Interface (PNNI) [8] signaling protocols in ATM networks, Label Distribution Pro-

tocol (LDP) [9], Constraint-based Routed LDP (CR-LDP) [10] and Resource ReSerVation

Protocol - Traffic Engineering (RSVP-TE) [11] in Multi-Protocol Label Switched (MPLS)

networks, and the extensions of these protocols for Generalized MPLS (GMPLS) [12]-

[15], which includes SONET/SDH and DWDM networks.

Setting up a connection at a switch involves five steps:

1. Determining the next-hop switch toward which the connection should be routed. This

task typically requires a data table look-up. Routes to destinations are pre-computed

3

using data gathered by routing protocols and stored in data tables.

2. Checking the availability of and reserving required resources (link capacity and

optionally buffer space). This step is generally called connection admission control.

3. Assigning “labels” for the connection. The exact form of the “label” depends on the

type of connection-oriented network. For example, in SONET/SDH switches, a label

identifies time-slots on the incoming and outgoing switch interfaces.

4. Programming the switch fabric to map incoming labels to outgoing labels.

5. Updating the state information associated with the connection.

In a typical connection setup procedure as illustrated in Fig. 1, a Setup message1

requesting the setup of a connection progresses from the calling end device (source)

toward the called end device (destination) hop-by-hop, and a Setup Success message trav-

els in the reverse direction, again hop-by-hop. In this scenario, the first three steps should

be performed in the forward direction so that resources are reserved as the Setup proceeds,

while the remaining steps could be performed as signaling message proceeds in the for-

ward direction or in the reverse direction. Other variants of this procedure are possible

such as reverse direction resource reservation [16].

1 Here we use a generic name for the message, i.e., Setup. Different signaling protocols call this message by differentnames, e.g., Setup in ATM UNI, Label Request in CR-LDP, or Path in RSVP-TE.

SwitchSW4

SwitchSW5

SwitchSW1 Switch

SW2

SwitchSW3

Source Destination1. Setup

2. Setup 3. Setup4. Setup

5.Setup Success6. Setup Success7. Setup Success

8. Setup Success

9. Connection (circuit orvirtual circuit) established

Figure 1. Illustration of connection setup.

4

Upon completion of data transfer, the connection is released with a similar end-to-end

release procedure. Typically release messages are also confirmed. Switches processing the

release messages free up bandwidth, optionally buffer, and label resources for usage by

the next connection.

To support the above-described connection setup and release procedures, signaling

messages with parameters in each message, some mandatory and some optional, are

defined in a typical signaling protocol. In addition, other messages to support notifica-

tions, keep-alive exchanges, etc., are also present in signaling protocols.

With regards to implementation, we illustrate a typical architecture of a switch

(unfolded view) in connection-oriented networks in Fig. 2. The user-plane hardware con-

sists of a switch fabric and line cards that terminate input/output interfaces carrying user

traffic. In packet switches, the line cards perform network-layer protocol processing to

determine how to forward packets. In circuit switches, the line cards are typically multi-

plexers/demultiplexers.

Figure 2. Unfolded view of a switch in connection-oriented networks.

Linecard

Linecard

Linecard

Linecard

SwitchFabric

User plane

InputInterfaces

OutputInterfaces

HardwareSignaling

Accelerator NIC

NIC

NIC

NIC

Control plane

Input signalingInterfaces

Output signalingInterfaces

uPRouting Signaling

processprocessLMP

process

IP

LMPRSVP-TE

UDP

CR-LDP

TCP

BGPOSPF-TE

Figure 3. GMPLS protocol stack diagram.

5

The control-plane unit typically consists of three modules, routing, link management,

and signaling. For example, the IETF CCAMP working group is defining the protocols for

GMPLS regarding all three aspects, such as Open Shortest Path First - Traffic Engineering

(OSPF-TE) [17] for routing, Link Management Protocol (LMP) [18] for link manage-

ment, and RSVP-TE/CR-LDP for signaling, as shown in Fig. 3. These control-plane mod-

ules are typically implemented as software processes residing on the microprocessor,

although in Fig. 2, we show a hardware signaling accelerator to speed up the processing of

signaling messages. Network Interface Cards (NICs) are shown on the control-plane.

These cards are used to process the lower layers of the signaling protocols on which the

signaling messages are carried. For example, in SS7 networks, the NICs process the Mes-

sage Transfer Part (MTP) layers, which are the lower layers of the SS7 protocol stack. In

optical networks, the expectation is that an out-of-band IP network will be used to carry

signaling messages between switches. In this case, the NICs may be Ethernet cards. It is

also possible to carry the signaling messages on the same interface as the user data. For

example, the Data Communication Channel (DCC) in each SONET/SDH signal is often

used for signaling. It is referred to as “in-band signaling.” Fig. 4 shows an example of con-

nection-oriented network, where user-plane and control-plane are located in different net-

works (“out-of-band signaling”).

Figure 4. An example of connection-oriented networks.

Control planeUser plane

6

1.2 Problem statement

Signaling protocols are primarily implemented in software for two important reasons.

First, signaling protocols are quite complex with many messages, parameters and proce-

dures, especially the signaling protocols for GMPLS, which target almost all connection-

oriented networks. Second, signaling protocols are updated frequently requiring a certain

amount of flexibility for upgrading field implementations. For example, RSVP-TE with

extensions for GMPLS evolved from RSVP-TE for MPLS, which, in turn, evolved from

RSVP for IP networks. Request for Comments (RFCs) and Internet drafts are being con-

tinuously proposed and updated to enhance RSVP-TE. Hence an implementation of

RSVP-TE must be capable of frequent upgrades. While these are two good reasons for

implementing signaling protocols in software, the price paid is in performance. Signaling

protocol implementations in software are rarely capable of handling over 1000 calls/sec.

Correspondingly, call setup delays per-switch are in the order of milliseconds [19].

Connection-oriented networks have been gaining more attention because of their

inherent QoS support. Even IP routers are being enhanced with connection-oriented fea-

tures. For example, MPLS is being added to IP networks along with RSVP-TE to establish

Label Switched Path (LSP) tunnels. These trends require a break-through in call handling

capacities. The problem statement of this research is to determine whether signaling proto-

cols can be implemented in hardware, and if so, to demonstrate it with an actual imple-

mentation [20]. In addition, we explore the impact of hardware-accelerated signaling

protocol implementations and study the related question of how to reduce signaling mes-

sage transmission delays.

Implementing signaling protocols in hardware is a great challenge considering the

7

complexity and the requirement for flexibility of such protocols. The complexity problem

can be partly overcome by defining a subset of signaling protocols for hardware accelera-

tion. This subset should cover most time-critical operations. Nevertheless, innovative

hardware implementation techniques are required. Towards solving the problem of flexi-

bility, we propose to use re-configurable hardware, i.e., Field Programmable Gate Arrays

(FPGAs) as the hardware platform for implementation. FPGAs can be re-configured as

signaling protocols evolve.

Our implementation is targeted for transit switches in SONET networks, and it sup-

ports point-to-point unidirectional connections.

1.3 Related work

There are many signaling protocol standards as listed in Section 1.1. In addition, many

other signaling protocols have also been proposed in the literature [21]-[26]. Some of

these protocols such as fast reservation schemes [21][22], YESSIR [23], PCC [24] have

been designed to achieve low call setup delays by improving the signaling protocols them-

selves. Fast Reservation Protocol (FRP) [25] is the only signaling protocol that has been

implemented in Application Specific Integrated Circuit (ASIC) hardware. Such an ASIC

implementation is inflexible because upgrading the signaling protocol implementation

entails a complete re-design of the ASIC. Recently, a simple signaling protocol called

“JumpStart”, which was designed for hardware implementation, was proposed in [26] for

burst-switched networks.

Because of the popularity of RSVP-TE for MPLS networks and the desire for scalable

QoS support in large IP networks, some researchers have also explored hardware-acceler-

ated implementation of the RSVP-TE signaling protocol. In [27], the “Keep-It-Simple”

8

Signaling (KISS) protocol, a simplified version of RSVP-TE for hardware-acceleration,

was proposed. They implemented the KISS protocol in software for debugging and proto-

col development. They also discussed possible approaches for hardware implementation

and estimated the achievable performance, but no hardware implementation was carried

out. The KISS protocol is not compatible with RSVP-TE and any other available signaling

protocols, it cannot inter-operate with any deployed switches or MPLS-enabled IP routers.

The KISS protocol is targeted for MPLS-enabled IP networks, and will not work with

other connection-oriented network technologies.

Other comparable protocols implemented in hardware include TCP. In [28], Benz pro-

posed to implement the “normal” TCP functionalities in hardware and handle complex

functionalities such as congestion control, error control, in software. He implemented his

approach on the Myrinet® platform. A similar concept, TCP Offload Engine (TOE) [29],

is gaining some popularity in today’s market. These solutions off-load part of the TCP

functionalities from the CPU to a co-processor located at a NIC or a Host Bus Adapter

(HBA, the NIC equivalent in Storage Area Networks).

Molinero-Fernandez and Mckeown [30] proposed to implement a technique called

TCP switching, in which the TCP SYNchronize segment is used to trigger connection

setup and TCP FINish segment is used to trigger connection release. By processing these

segments inside switches, the TCP SYN/FIN procedures become comparable to a signal-

ing protocol for connection setup/release. The authors planned to implement this tech-

nique in FPGAs.

9

1.4 Outline

This dissertation is organized as follows.

In Chapter 2, we discuss our work on the design and implementation of a new perfor-

mance-oriented signaling protocol called Optical Circuit-switched Signaling Protocol

(OCSP) [31], which we define for SONET networks. While in typical signaling protocol

definitions, flexibility is the primary concern, our primary goal in designing this protocol

is to achieve high-performance implementation.

Concurrent to our definition and implementation of OCSP, the optical networking

industry defined new signaling protocols for SONET/SDH/DWDM networks. Of these,

RSVP-TE for GMPLS is the one that has gained popularity, and is now widely imple-

mented by many vendors. This makes it important to demonstrate our concept of hardware

accelerated signaling with an implementation of RSVP-TE.

As a generalized signaling protocol targeting almost all connection-oriented networks,

RSVP-TE for GMPLS is quite complex and not intended for hardware implementation. To

solve this challenge we define a subset of RSVP-TE [32] for hardware acceleration and

relegate the remaining functionality to software. The former should be large enough to

handle most signaling messages and yet small enough to make hardware implementation

feasible. Chapter 3 describes this work.

In Chapter 4, we describe the procedures needed to implement this subset of RSVP-

TE [33]. Registers and data tables used in the hardware implementation are defined.

Based on the subset of RSVP-TE defined in Chapters 3 and 4, Chapter 5 describes the

architecture of hardware implementation. It is implemented on a Xilinx Virtex-II FPGA,

which we call the “hardware signaling accelerator [34].” We explain the main functional

10

modules in detail, present the implementation and simulation results. We also address the

issue of re-configurability.

In order to demonstrate a real switching system, in which a switch fabric can work

under the control of the hardware signaling accelerator described in Chapter 5, we build a

PCI-based prototype board. We present this work in Chapter 6. The board-level architec-

ture is given. Main on-board devices are discussed in detail. We also explain the clock and

power distribution schemes.

In Chapter 7 we discuss a related issue on how to reduce signaling message transmis-

sion delays. We compare the call setup delays of two options: in-band signaling and out-

of-band signaling under different scenarios. This issue is important because the selection

of signaling transport mechanisms affects the total call setup delay. Gains in reducing of

call setup delay through hardware-accelerated signaling must be through sound design of

the signaling transport mechanism.

Our work on hardware-accelerated implementation of signaling protocols has a far-

reaching impact on connection-oriented networks. With decreased call setup delay and

increased call handling rate, connections can be established and released much faster, and

switches can handle more requests and maintain more simultaneous connections. In

Chapter 8, we model and analyze the impact of hardware-accelerated signaling on current

connection-oriented networks, and propose applications that can fully exploit the benefit

of hardware-accelerated signaling.

11

Chapter 2 Optical Circuit-switched Signaling Protocol

(OCSP): Specification and Implementation

In Chapter 1, we mentioned many signaling protocols. Those protocols are complex

and flexible, and not intended for hardware implementation. In order to demonstrate the

feasibility and advantage of hardware-accelerated signaling, we design and implement a

signaling protocol called OCSP, which is specifically engineered for SONET networks.

While in typical signaling protocol definitions, flexibility is a primary concern, our pri-

mary goal in designing this protocol is to achieve high-performance implementations.

We introduce OCSP in Section 2.1. It is not a complete signaling protocol specifica-

tion. Our approach is to define a subset large enough that a significant percentage of user

requirements can be handled with this subset. Infrequent operations are delegated to the

software signaling process. Therefore, often in this description, we will leave out details

that are handled by the software. In Section 2.2, we demonstrate the hardware implemen-

tation of OCSP. The hardware platform, the design, and the implementation and simula-

tion results are discussed.

2.1 OCSP - a signaling protocol for performance

In this section, we first introduce four OCSP signaling messages used to set up (Setup

and Setup-Success), and tear down (Release and Release-Confirm) connections. We then

give the state transition diagram of a connection at each switch. Finally we discuss the

data tables we introduce to facilitate the hardware implementation of the OCSP signaling

protocol.

12

2.1.1 Signaling messages

We define four signaling messages, Setup, Setup-Success, Release, and Release-Con-

firm. Fig. 5 illustrates the detailed fields of these four messages.

The Message Length field specifies the length of a message. The Time-to-Live (TTL)

field has the same meaning as in IP header. The Message Type field distinguishes four dif-

ferent messages. The Connection Reference identifies a connection locally. The Source IP

Address, Destination IP Address, and Previous Node IP Address specify the end hosts and

the previous node respectively. The Bandwidth field specifies the bandwidth requirement

of a connection. The Interface/time-slot pairs are used to program the switch fabric. If

there are an odd number of interface/time-slot pairs, a 16-bit pad is inserted because the

message is 32-bit aligned. The Checksum field covers the whole message.

In Setup-Success message, the Bandwidth field records the allocated bandwidth. In

Release and Release-Confirm messages, the Cause field explains the reason of release.

The Message Length, Message Type and Connection Reference fields are common to all

messages and occupy the same relative position. Such an arrangement simplifies hardware

design.

Figure 5. OCSP signaling messages.

Setup Message

Message Length TTL Msg.Typ (0001) Connection Ref. (prev.)Destination IP Address

Source IP AddressPrevious Node IP Address

Bandwidth Reserved Interface Number Timeslot NumberPad Bits Checksum

Setup-SuccessMessage

Message Length Bandwidth Msg.Typ (0010) Connection Ref. (prev.)Connection Ref.(own) Reserved Checksum

Release/Release-Confirm

Message

Message Length Cause Msg.Typ(0011/0100) Connection Ref. (prev.)

Connection Ref.(own) Reserved Checksum

31 16 15 0

13

2.1.2 State transition diagram of a connection

Each connection passes through a certain sequence of states at each switch. In our pro-

tocol, we define four states: Setup-Sent, Established, Release-Sent and Closed. Fig. 6

shows the state transition diagram. Initially, the connection is in the Closed state. When a

switch receives a Setup message, if resources are available, the switch accepts the request,

reserves the resources, sends the message to the next switch on the path, and marks the

state of the connection as Setup-Sent. When a switch receives a Setup-Success message,

which means all switches along the path have successfully established the connection, the

state of the connection changes to Established. Release-Sent means the switch has

received a Release message, freed the allocated resources, and sent the message to the pre-

vious node. After the switch receives a Release-Confirm message, the connection is suc-

cessfully terminated, the state of the connection returns to Closed.

Figure 6. State transition diagram of a connection.

Closed(0001)

SetupSent

(0010)

ReleaseSent

(0100)

Setup MessageReceived

Release or ReleaseConfirm Received

Setup SuccessReceived

Release MessageReceived

Release ConfirmReceived

Established(0011)

14

2.1.3 Data tables

There are five data tables associated with the signaling protocol, namely, Routing

table, Connection Admission Control (CAC) table, Connectivity table, State table, and

Switch-Mapping table, as shown in Fig. 7. The Routing table is used to determine the next-

hop switch. The index is the destination address; the fields include the address of the next

switch and the corresponding output interface. The CAC table maintains the available

bandwidth on the interfaces leading to neighboring switches. The Connectivity table is

used to map the interface numbers used at neighboring switches to local interface num-

bers. This information will be used to program the switch fabric.

The State table maintains the state information associated with each connection. The

connection reference is the index into the table. The fields include the connection refer-

ences and addresses of the previous and next switches, the bandwidth allocated for the

connection, and most importantly, the state information as defined in Fig. 6.

Switch fabrics, such as PMC-Sierra® PM5372, Agere® TDCS6440G and Vitesse®

VSC9182, have similar programming interfaces. For example, VSC9182 has an 11-bit

address bus A[10:0] and a 10-bit data bus D[9:0]. The switch is programmed by present-

Figure 7. Data tables used by the signaling protocol.

Index Return valueDestination address Next node address Next node interface#

Index Return/Written valueNext node address Total bandwidth Available bandwidth

Index Return valueNeighbor address Neighbor interface# Own interface#

Index Return/Written valueOwn connection

referenceConnection reference

State BandwidthNode address

Previous Next Previous Next

Index Return/Written value

Own connectionreference

Sequentialoffset(0 to

BW-1)

Incoming Ch. ID Outgoing Ch. ID

Interface# Timeslot# Interface# Timeslot#

Routingtable

CACtable

Conn.table

Statetable

SwitchMappingtable

15

ing the output interface/time-slot pair on A[10:0] and the input interface/time-slot pair on

D[9:0]. We define a generic Switch-Mapping table to emulate this programming interface,

with the connection reference as the index and the incoming interface/time-slot pair and

the outgoing interface/time-slot pair as the fields.

2.2 Implementation of OCSP on WILDFORCE platform

To demonstrate the feasibility and advantage of hardware accelerated signaling, we

implement OCSP on FPGA. We use the WILDFORCETM re-configurable computing

board shown in Fig. 8. It consists of five XC4000 series Xilinx FPGAs: one XC4036

(CPE0) and four XC4013 (PEn). These five FPGAs can be used for user logic while the

cross-bar provides programmable interconnections between the FPGAs. In addition, there

are three First-In First-Out (FIFO) devices on the board, and one dual-port Random

Access Memory (RAM) device attached to CPE0. The board is connected to the host sys-

tem through the PCI bus. The board supports a C language based Application Program-

ming Interface (API) through which the host system can dynamically configure the

FPGAs and access the on-board FIFOs and RAMs.

Figure 8. Architecture of WILDFORCETM board.

PCIBus

PCIchip

FIFO 0 FIFO 1 FIFO 4SRAM

Cross-Bar

Local Bus

Local Bus

Host 32

32

Mezzaninememory

CPE 0

PE 2 PE 3 PE 4PE 1

36

3636Signaling

accelerator

16

For our prototype implementation, we use CPE0, PE1, FIFO0, FIFO1 and dual-port

RAM. The CPE0 implements the hardware signaling accelerator state machine, State and

Switch-Mapping tables, FIFO0 controller, and dual-port RAM controller. The dual-port

RAM implements the Routing, CAC and Connectivity tables. FIFO0 and FIFO1 work as

receive and transmit buffers for signaling messages. PE1 implements the FIFO1 controller

and provides the data path between CPE0 and FIFO1.

In the following subsections, we discuss some parameters selected for our hardware

implementation, and the state transition diagram of the hardware signaling accelerator. We

also present two novel approaches for managing resources such as time-slots and connec-

tion references.

2.2.1 Parameters

Our implementation supports 5-bit addresses for all nodes, 16 interfaces per switch, a

maximum bandwidth of 16 STS-1s per interface, and a maximum of 32 simultaneous con-

nections at each switch. We are primarily limited by the size of the on-board dual-port

RAM on our off-the-shelf prototype board, and hence are forced to use these small values

for address sizes, etc. The FPGA implementation itself does not limit these parameters.

Hence, we can easily increase the values of these parameters when designing a customized

board.

Recently, there has been significant progress in fast table look-up in both research lit-

erature [35][36] and commercial products. Look-up/classification co-processors are

widely available, such as Silicon Access Networks® iAP, PMC-Sierra® ClassiPI, Soli-

dum® PAX1200, etc. These chips can easily process look-ups/sec. In our prototype

implementation, we assume that routing table look-ups can be off-loaded to an external

100M

17

co-processor, and use an equivalent four memory access duration to simulate a routing

table look-up.

2.2.2 State transition diagram of the hardware signaling accelerator

Fig. 9 shows the detailed state transition diagram of the hardware signaling accelera-

tor. When a signaling message arrives, it is temporarily buffered in FIFO0. The hardware

signaling accelerator then reads the message from FIFO0 and delimits the message

according to the Message Length field. The Checksum field is verified. The State table is

consulted to check the current state of the connection. Based on the Message Type field,

the hardware signaling accelerator processes the message accordingly. The processing of

the Setup message involves checking the TTL field, reading the Routing table to determine

the next switch and corresponding output interface, updating the CAC table, reading the

Connectivity table to determine the input interface, allocating a connection reference to

identify the connection, allocating time-slots and programming the Switch-Mapping table.

The Setup-Success message requires no special processing. The processing of the Release

Figure 9. State transition diagram of the hardware signaling accelerator.

reset ready receive

parse

rd route

rd CAC

rd conn

allc cref

allc ts

wr switch

wr state

rd state

rd CAC

Release

wr CAC

rd switch

free ts

free cref

ready

bw>1

transmit

statemismatch

checksumfailed

Release-Confirm

bw>1

chk TTL Setuperror

error no route

TTL expired

errorerror

error

other

Setup-Success

bandwidthunavailable

invalidparameter

wr CAC

18

message involves updating the CAC table, releasing the time-slots reserved for the con-

nection. When processing the Release-Confirm message, the allocated connection refer-

ence is freed and thus, the connection is terminated. After processing any message, the

State table is updated. The new message is generated and buffered in FIFO1 temporarily,

and then transmitted to the next switch along the path.

2.2.3 Managing the available time-slots in hardware

The management of time-slots and connection references is easy in software through

simple array manipulations. However, this poses a challenge in hardware implementa-

tions. Our solution is to use a priority decoder.

Fig. 10 illustrates our implementation of a time-slot manager. Each entry in the time-

slot table is a bit-vector, corresponding to an output interface with the bit-position deter-

mining the time-slot number and the bit-value determining availability of the time-slot

(‘0’ available, ‘1’ used). The priority decoder is used to select the first available time-slot.

When an interface number is provided by the signaling state machine to the time-slot man-

ager, the bit-vector corresponding to the interface is sent into the priority decoder and the

first available time-slot is returned. Then the bit corresponding to the time-slot is marked

Figure 10. Time-slot manager.

1 0 0 0 1 1 0 0

...

...

0 0 1 1interface#

1 0 0 0 1 1 0 0...1 1 0 0 1 1 0 0...

scratch register

... ...

012312131415

mark used

write back

12

5

6

0123

12131415

PriorityDecoder

1 1 1 0output timeslot

34

...

19

as used (from ‘0’ to ‘1’) and the updated bit-vector is written back to the table. In the

example shown in Fig. 10, the time-slot manager is required to find a free time-slot on

interface 3. It returns time-slot 14 and marks it as “used.” De-allocating a time-slot fol-

lows a similar pattern but the time-slot number is needed as an input in addition to the

interface number in order for the time-slot manager to free the time-slot.

2.2.4 Managing the connection references in hardware

A connection reference is used to identify a connection locally. It is allocated when

establishing a connection and de-allocated when terminating it. A straightforward imple-

mentation of a connection reference manager is a bit-vector combined with a priority

decoder. The priority decoder finds the first available bit-position (a bit marked as ‘0’),

sends its index as the connection reference and updates the bit as used (a bit marked as

‘1’). However, this approach is impractical when there are a large number of connections.

While our actual implementation only used connections per switch, we designed the

connection reference manager to handle simultaneous connections, which requires a

bit-vector with entries. This is too large for the simple priority decoder implementa-

Figure 11. Connection reference manager.

0 0 0 0 0 0 0 0

Table Pointer1 1 1 0 0 1 0 1

...

...

PriorityDecoder

Pointer Adjust

1 1 1 0 0 1 0 1...

Scratch Register

0 0 0 0 0 0 0 0 1 1 0 0

Output Connection Reference

1 1 1 1 0 1 0 1...

... ...012312131415

...

Mark Used

2

3

4

8

5

6

7

Write back

1 0123

252253254255

32

212

4096

20

tion as used for time-slots.

Our improvement to this basic approach is to use a table with 256 entries of 16-bit vec-

tors to record the availability of a total of connection references. Fig. 11 illustrates

this approach. With connections, we need a 12-bit connection reference. The first 8

bits of the connection reference correspond to the table pointer, while the remaining 4 bits

correspond to the first available connection reference from among the 16 pointed to by the

table pointer. The connection reference manager starts with the table pointer set to 0. If

any of the 16 connection references corresponding to this row of the table are available

(i.e., a bit position is ‘0’), the priority decoder will identify this index and write the output

connection reference as a concatenation of the 8-bit table pointer and the 4-bit index

extracted. In the example shown in Fig. 11, the 12th bit in the first row is a ‘0’. Therefore it

outputs the connection reference number 12. The bit-position is marked as used as illus-

trated with steps 5 - 7 of Fig. 11.

De-allocating follows a similar approach; the bit corresponding to the connection ref-

erence is reset to ‘0’ and the updated bit-vector is written back to the table. We can paral-

lelize this approach by partitioning the table into several smaller tables, each with a

pointer and a priority decoder, forming several smaller managers. All these managers

work concurrently. A round-robin style counter can be used to choose a connection refer-

ence among the managers. Thus, this approach can be generalized if more than con-

nections are to be handled.

2.2.5 Implementation and simulation results

We develop a prototype VHDL model for the hardware signaling accelerator, use Syn-

plify tool for synthesizing the design and Xilinx Alliance tool for the placement and rout-

4096

4096

4096

21

ing of the design. CPE0 (XC4036) uses of its resources while PE1 (XC4013) uses

of its resources.

We perform timing simulations of the hardware signaling accelerator using ModelSim

simulator. The simulation results for the Setup message are shown in Fig. 12. It can be

seen that while receiving and transmitting a Setup message (requesting a bandwidth of

STS-12 at a cross-connect rate of STS-1) consumes clock cycles each, processing of a

Setup message consumes clock cycles. Overall, this translates into clock cycles to

receive, process and transmit a Setup message. We estimate a maximum of clock

cycles if multiple (four) routing table look-ups are needed.

Processing Setup-Success, Release and Release-Confirm messages consumes about

clock cycles total since these messages are much shorter (2 32-bit words versus 11 32-

bit words for Setup) and require simpler processing. A detailed breakdown of the clock

cycles consumed to process each of these signaling messages is shown in Table 1. Assum-

ing a clock, this translates into to microseconds for Setup message pro-

cessing and about microseconds for the combined processing of Setup-Success,

Release and Release-Confirm messages. Thus, a complete setup and teardown of a con-

nection consumes about microseconds. Accordingly, the achievable call handling rate

62%

8%

12

53 77

101

Figure 12. Timing simulation for Setup message.

Setup messagereceive

Setup messageprocess

Setup messagetransmit

70

25MHz 3.1 4.0

2.8

6.6

22

is as high as calls/sec. It is a speed-up compare to software-

based implementations of signaling protocols.

Table 1. Clock cycles consumed by various OCSP messages.

Setup Setup Success Release Release

ConfirmClock cycles

77-101 9 51 10

150 000, 100X 1000X–

23

Chapter 3 A Subset of RSVP-TE for Hardware-

Acceleration

The hardware implementation of OCSP demonstrated a call handling rate of

calls/sec, which is a speedup vis-à-vis its software counterpart. However,

OCSP, designed by us as a proof-of-idea, is specifically engineered for performance and

limited to SONET networks. It is important to demonstrate our concept of hardware-accel-

erated signaling with an implementation of a widely used signaling protocol.

Among the signaling protocols we mentioned in Chapter 1, RSVP-TE for GMPLS

networks attracts our attention. It is one of the most widely implemented signaling proto-

cols. As one of the two signaling protocols recommended for GMPLS (the other is CR-

LDP), RSVP-TE can work with different types of connection-oriented networks, such as

SONET/SDH networks, DWDM networks. Most telecom equipment vendors support

RSVP-TE in their switches.

Implementing RSVP-TE for GMPLS in hardware is a challenge because of its com-

plexity. Our solution is to only implement the time-critical functions in hardware and rele-

gate the non-time-critical functions to software. For this purpose, it is essential to extract a

subset of RSVP-TE for hardware-acceleration. In Section 3.1, we briefly review the his-

tory of RSVP-TE for GMPLS. We then detail the subset we extract in Section 3.2. This

subset is targeted for implementation at a transit SONET switch and it only supports point-

to-point unidirectional connections.

150 000,

100X 1000X–

24

3.1 Review of RSVP-TE for GMPLS networks

The original RSVP was initially defined as a resource reservation setup protocol for

IntServ [37]. RSVP can be used by a host to request specific QoS from the network. It can

also be used by routers to deliver QoS requests to all nodes along the path, and to establish

and maintain state to provide the requested service.

RSVP was designed to work in a multicast scenario. In order to efficiently accommo-

date large groups, dynamic group membership, and heterogeneous receiver requirements,

RSVP makes receivers responsible for requesting a specific QoS. RSVP requests

resources for simplex flows, i.e., in one direction only. In order to reserve resources in

both directions, two sets of RSVP message exchanging will be incurred.

When the MPLS community was looking for a signaling protocol for MPLS, they first

came up with LDP and CR-LDP, which are specifically designed for MPLS. At the same

time, an enhanced version of RSVP with traffic engineering extensions (RSVP-TE) [11]

was introduced as an alternative for MPLS signaling. Both RSVP-TE and CR-LDP are

similar while RSVP-TE is more popular in the industry.

RSVP-TE was first designed for packet-switched MPLS networks. When MPLS was

generalized for circuit-switched networks, such as SONET/SDH networks, DWDM net-

works, RSVP-TE was also extended to support these network technologies [15]. Besides,

technology-specific extensions were defined to address technology-specific issues. For

example, [38][39] defined the traffic parameters and labels for SONET/SDH networks,

and G.709 Optical Transport Networks (OTNs), respectively.

RSVP supports the notion of soft-state and periodic refreshing. If a refresh is not

received before the time-out interval expires, connections are released. Since packet for-

25

warding is based on the IP routing data table, as routing data changes, the resource reser-

vations need to follow. Hence RSVP included the use of refresh message. A second reason

for refresh messages is that since RSVP uses unreliable IP service, the occasional loss of

an RSVP message is handled through refreshes. However, the first reason does not hold

for GMPLS networks, where once a circuit is established, the routing data table is not con-

sulted for data forwarding. And refreshing is not a good solution to the reliability issue: if

the refresh interval is small, the overhead spent in processing refresh messages can

become excessive; while if the refresh interval is large, it takes longer to detect the loss of

an RSVP message. Therefore RFC2961 [40] introduced a TCP-like mechanism to ensure

reliable message transport.

The RSVP-TE and related protocols are revised often and enhanced to accommodate

new applications while maintaining backward compatibility. The subset we describe in the

next section is mainly based on the original RSVP [16], the RSVP-TE for MPLS networks

[11], the RSVP-TE extensions for GMPLS networks [15], the SONET/SDH extensions

for GMPLS networks [38], and the RSVP refresh overhead reduction extensions [40].

3.2 A Subset of RSVP-TE for GMPLS networks

3.2.1 RSVP messages

All RSVP messages begin with a common header, followed by a body consisting of a

variable number of variable-length, typed “objects”.

3.2.1.1 RSVP common header

Fig. 13 shows the construct of RSVP common header. The 4-bit Vers field specifies

Vers Flags Msg Type RSVP Checksum

Send_TTL (Reserved) RSVP Length

Figure 13. RSVP message common header.

26

the protocol version number. Current RSVP (including RSVP-TE and its extensions) is

version 1. RSVP reserves 4 bits for Flags. No flag bits are defined in [11][15][16], in [40],

bit 0 was introduced as Refresh-Reduction-Capable bit, value ‘1’ indicates the support for

the refresh overhead reduction extensions. The RSVP_Checksum field calculates the one’s

complement of the one’s complement sum of the whole message. This field helps to deter-

mine possible message error. The RSVP_Length field indicates the total length of the

RSVP message in bytes.

The Msg_Type field specifies the type of the message. There are 7 messages defined in

traditional RSVP, Path, Resv, PathErr, ResvErr, PathTear, ResvTear, and ResvConf.

RSVP-TE added Hello message for the purpose of node failure detection. When RSVP-

TE was enhanced for GMPLS, Notify message was introduced to support fast failure noti-

fication.

RSVP messages are carried in IP packet directly. The corresponding protocol number

in IP header is 43. There is a TTL field in IP header. RSVP common header also has a

Send_TTL field, which records the TTL value of the transmitting node. This field can be

used to indicate the existence of non-RSVP capable node. If the received TTL in the IP

header is different from the transmission TTL stored in the common header, there must be

non-RSVP nodes exist between two neighboring RSVP-capable nodes.

3.2.1.2 RSVP messages

Totally nine RSVP messages are defined in [11][15][16]. Among these messages, Path

and Resv messages are used to set up an LSP, PathTear/ResvTear is used to tear down an

LSP. These four messages should be processed by hardware. ResvConf message, used to

confirm Resv message, is triggered by an optional object, RESV_CONFIRM, in Resv mes-

27

sage. We assume normally there is no ResvConf message in our application. All other mes-

sages, PathErr, ResvErr, Hello, and Notify, are non-time-critical. These four messages,

together with ResvConf, should be processed by software.

This section details the format of the four RSVP messages to be processed by hard-

ware. Messages are defined in Backus-Naur Form (BNF)1 format. We omit all optional

objects in the messages. The order of the objects in a message is recommended, but not

mandatory, meaning an implementation should create messages with the objects in the

order shown below, but accept the objects in any permissible order.

1. Path message

A Path message is used by an upstream Label Switching Router (LSR) to ask a down-

stream LSR to allocate a label for an LSP. More generally, it is a request to set up an LSP.

There are 6 objects mandatory in Path Message, SESSION, RSVP_HOP, TIME_VALUES,

LABEL_REQUEST, SENDER_TEMPLATE, and SENDER_TSPEC.

<Path Message>::= <Common Header>

<SESSION>

<RSVP_HOP>

<TIME_VALUES>

<LABEL_REQUEST>

<sender descriptor>

<sender descriptor>::= <SENDER_TEMPLATE>

<SENDER_TSPEC>

1 Backus-Naur Form, introduced by John Backus and improved by Peter Naur, is one of the most commonly used meta-syntactic notation for specifying the syntax of programming languages, command sets, and the like. The meta-symbolsof BNF are:::= meaning “is defined as”| meaning “or”<> angle brackets used to surround item names[] square brackets used to surround optional itemsThe BNF implies an order for the items.

28

2. Resv Message

A Resv message is used by a downstream LSR to advertise the binding of a label to a

specific LSP. After a label is allocated for an LSP, a Resv message, which carries the label

assigned to the LSP, is sent to the upstream LSR. There are 7 objects mandatory in Resv

message, SESSION, RSVP_HOP, TIME_VALUES, STYLE, FLOWSPEC, FILTER_SPEC,

and LABEL.

<Resv Message>::= <Common Header>

<SESSION>

<RSVP_HOP>

<TIME_VALUES>

<STYLE>

<flow descriptor list>

<flow descriptor list>::= <FLOWSPEC>

<FILTER_SPEC>

<LABEL>3. PathTear and ResvTear Messages

In RSVP-TE, either source or destination can tear down an LSP. Source node tears

down an LSP by sending out a PathTear message, while the destination node can do the

same by sending out a ResvTear message.

<PathTear Message>::= <Common Header>

<SESSION>

<RSVP_HOP>

<sender descriptor>

<sender descriptor>::= <SENDER_TEMPLATE>

<ResvTear Message> :: =<Common Header>

<SESSION>

29

<RSVP_HOP>

<STYLE>

<flow descriptor list>

<flow descriptor list>::= <FLOWSPEC>

<FILTER_SPEC>

3.2.2 RSVP objects

Objects following the common header are the real carriers of message fields. Each

object, a Type-Length-Value (TLV) triplet, is a self-contained element, consisting of one

or more 32-bit words with a one-word header. Fig. 14 shows the construct of an RSVP

object.

As the name suggests, a TLV-structured object consists of a Type field, a Length field,

and a variable length Value field. For RSVP, the Type field can be further split into Class-

Num field and C-Type field. The Class-Num field specifies the object class, while the C-

Type field further qualifies the type within the object class. The Length field specifies the

length of the object. Fig. 15 shows SESSION object as an example. As a mandatory object

required in every RSVP message, the Class-Num of SESSION object is 1. In [16], SES-

SION object is used to identify an RSVP session, the corresponding C-Type is 1. For this

purpose, the object contents include the destination address, protocol ID, and destination

port number (i.e., application running on the destination for which this connection is being

established), as shown in Fig. 15a. With the extension of RSVP to RSVP-TE, a new C-

Type was defined (C-Type of 7). The new SESSION object, including destination address,

Length (bytes) Class-Num C-Type

(Object contents)...

Figure 14. RSVP object.

30

tunnel ID, and extended tunnel ID, is used to identify an LSP tunnel.

With RSVP-TE signaling protocols evolve, some fields, parameters are obsolete while

more new fields, parameters are introduced. TLV-structured object is flexible, extensible,

and suitable for these evolving protocols. In this section, we discuss all mandatory objects

in Path, Resv, PathTear, and ResvTear messages.

3.2.2.1 FILTER_SPEC/SENDER_TEMPLATE object

The FILTER_SPEC/SENDER_TEMPLATE object specifies the source address of an

LSP. The former is carried in Resv message with a Class-Num of 10, while the latter car-

ried in Path message with a Class-Num of 11, Fig. 16. C-Type 7 is defined for IPv4 tunnel.

FILTER_SPEC/SENDER_TEMPLATE object, together with SESSION object, forms a five

tuple <Src_IP_Addr, LSP_ID, Dest_IP_Addr, Tunnel_ID, Ext_Tunnel_ID>, which

uniquely identifies an LSP.

The 32-bit IPv4 tunnel sender address is the IP address of the sender, the 16-bit LSP

ID is a locally (at the sender) allocated ID for the LSP. The FILTER_SPEC/

SENDER_TEMPLATE object keeps unchanged across the whole LSP.

3.2.2.2 FLOWSPEC/SENDER_TSPEC object

The FLOWSPEC/SENDER_TSPEC object specifies the required traffic parameters of

Figure 15. TLV structure of the SESSION object.

Length (16) Class-Num (1) C-Type (7)IPv4 tunnel end point address

Must be zero Tunnel IDExtended Tunnel ID

Length TypeValue

Length (12) Class-Num (1) C-Type (1)IPv4 Dest Address

Protocol ID Flags Dst Port

(a). SESSION object defined in RFC2205 (RSVP). (b). SESSION object defined in RFC3209 (RSVP-TE).

Length Class-Num (10/11) C-Type (7)

IPv4 tunnel sender address

MUST be zero LSP ID

Figure 16. FILTER_SPEC/SENDER_TEMPLATE object.

31

an LSP. The former is carried in Resv message (Class-Num 9), the latter carried in Path

message (Class-Num 12), Fig. 17. The FLOWSPEC and SENDER_TEMPLATE objects

were first introduced in [16] where the explanations of these two objects are subject to Int-

Serv [37]. RSVP-TE (GMPLS) with extensions supporting SONET/SDH introduced a

new C-Type (TBA).

SONET/SDH signal has a hierarchical structure. Each signal has an elementary signal,

which can be further concatenated (contiguously or virtually) to form higher rate signal.

The Signal_Type field specifies the type of the elementary signal. The RCC field is used to

request for contiguous concatenation. If the RCC field is ‘1’, the contiguous concatenation

is used, the NCC field indicates the number of identical elementary signals that are

requested to be concatenated. The NVC field indicates the number of signals that are

requested to be virtually concatenated. Based on all previous fields, the MT field indicates

the number of identical signals that are requested for the LSP, i.e., that form the final sig-

nal. These signals can be either identical elementary signals, or identical contiguously

concatenated signals, or identical virtually concatenated signals.

3.2.2.3 Generalized LABEL object

The LABEL object was initially defined in [16] to support the setup of an LSP tunnel,

Class-Num 16, C-Type 1. In [13]-[15], the LABEL object is generalized to represent time-

slot, wavelength, switch port, etc., as shown in Fig. 18. The generalized LABEL object,

carried in Resv message, extends the traditional label by allowing the representation of not

Length Class-Num (9/12) C-Type (TBA)

Signal Type RCC NCC

NVC Multiplier (MT)

Transparency (T)

Figure 17. FLOWSPEC/SENDER_TSPEC object.

32

only labels which travel in-band with associated data packets, but also labels which iden-

tify time-slots, wavelengths, or space division multiplexed position. In our application,

SONET, a label represents a set of time-slots.

Each generalized LABEL object carries a variable length label. The format of the label

depends on the application. Fig. 19 shows the format of a SONET/SDH label. The expla-

nations of these fields are given in Fig. 20.

3.2.2.4 Generalized LABEL_REQUEST object

The LABEL_REQUEST object defines the characteristics of the requested LSP. It was

initially defined in [11] to support the setup of an LSP tunnel. In [13]-[15], a generalized

LABEL_REQUEST object was introduced, as shown in Fig. 21. The suggested C-Type is

Length Class-Num (16) C-Type (2)

Label...

Figure 18. LABEL object.

S U K L M

Figure 19. Format of a SONET/SDH label.

STS-3N

STS-3 SPE 3c

STS-1 SPE

VT Group VT-6

VT-3

VT-2

VT-1.5

xN

x3

x7

L

x2

x3

x4

M

44.736MB

44.736MBDS3

6.312MBDS2

2.048MBE1

3.152MBDS1C

1.544MBDS1

For example, S>0, U=1, K=0, L=0, M=0denotes the unstructured SPE of the s-th STS-3and the 1st STS-1.

US

Figure 20. SONET multiplexing hierarchy and generalized label.

33

4.

The 8-bit LSP Encoding Type indicates the encoding of the LSP being requested. The

value for SONET/SDH is 5. The 8-bit Switching Type specifies the type of switching per-

formed on a link. The value for “Time-Division-Multiplex Capable” is 100 (our implemen-

tation is for transit SONET switches, which use TDM technology). G-PID identifies the

payload carried by the LSP.

3.2.2.5 RSVP_HOP object

The RSVP_HOP object was initially defined in [16], Class-Num 3, C-Type 1-2. In

[15], in order to support out-of-band signaling (separation of control channel and data

channel), a new C-Type, with support for interface ID, was added. The suggested C-Type

value is 3.

In RSVP, RSVP_HOP object carries the IP address of the RSVP-capable node that sent

this message, it is called PHOP (“previous hop”) in downstream messages (Path mes-

sages) or NHOP (“next hop”) in upstream messages (Resv messages). When RSVP-TE is

extended for GMPLS, in order to support the separation of control channel and data chan-

nel, a IF_INDEX TLV is added, Fig. 22.

The 32-bit IPv4 Next/Previous Hop Address specifies the IP address of the previous


LSP Enc. Type Switching Type G-PID

Figure 21. LABEL_REQUEST object.

Length (24) Class-Num (3) C-Type (3)

IPv4 Next/Previous Hop Address

Logic Interface Handle (LIH)

Type (3) Length (12)IP AddressInterface ID

IF_INDEX TLV

Figure 22. RSVP_HOP object.

34

LSR (in a Path message) or next LSR (in a Resv message) on the control plane. The 32-bit

LIH specifies the logical outgoing interface on which the reservation is required.

IF_INDEX itself is a TLV. Inside it, the 32-bit IP Address specifies the IP address of the

previous LSR or next LSR on the user plane. The 32-bit Interface ID can be used to iden-

tify a physical interface.

3.2.2.6 SESSION object

The SESSION object was initially defined in [16], Class-Num 1, C-Type 1/2. Refer-

ence [11] defined C-Types 7/8 for LSP tunnel. In our application, C-Type is 7, meaning

IPv4 tunnel, Fig. 23. SESSION object is carried in all four RSVP-TE messages. As men-

tioned before, SESSION object, together with SENDER_TEMPLATE/FILTER_SPEC

object, uniquely identifies an LSP.

The 32-bit IPv4 tunnel end point address specifies the destination address of an LSP.

The 16-bit tunnel ID and 32-bit extended tunnel ID are used to identify an LSP tunnel.

The STYLE and TIME_VALUES objects are also mandatory in [16]. However, these

two objects become obsolete in [11][15] when RSVP was extended for MPLS and

GMPLS. We include these two objects in the subset because they are mandatory. We

expect that the hardware will accept but not process these two objects. Therefore the

details of these two objects are omitted.

3.2.3 Optional objects support

Besides the mandatory objects described above, we also support several optional


IPv4 tunnel end point address

Must be zero Tunnel ID

Extended Tunnel ID

Figure 23. SESSION object.

35

objects in the subset, i.e., SUGGESTED_LABEL, MESSAGE_ID and MESSAGE_ID_

ACK. Although these objects are currently optional, we expect them to be widely adopted

in implementations of RSVP-TE for GMPLS networks.

3.2.3.1 SUGGESTED_LABEL object

The SUGGESTED_LABEL object was first introduced in [15] for the purpose of allo-

cating labels in the forwarding direction, Class-Num 129, C-Type 2. The

SUGGESTED_LABEL object has the same structure of generalized LABEL object.

There is potential conflict in SONET networks if the SUGGESTED_LABEL object in

Path message is left as optional. Consider the following scenario. An outgoing interface

(STS-12) has four available time slots, “000 011 111 111.” A first call requests an STS-3c;

the upstream switch tentatively reserves the first three time slots. A second call requesting

an STS-1 arrives next, and needs to be routed on this same outgoing interface. The switch

will then make a tentative reservation of the remaining time slot. If the Resv message for

the second call returns first, and the downstream switch assigns a time slot different from

the one tentatively reserved in forward direction, the first call for which a tentative reser-

vation was made can no longer be accommodated because it requires a concatenated

assignment. Hence, we recommend that the SUGGESTED_LABEL object be mandatory in

Path messages to force the downstream switch to use the label selected by the upstream

switch. This is the result of RSVP-TE growing out of RSVP, which was developed as a

protocol for receiver-initiated additions to a multicast tree. In GMPLS networks, where

hard resource reservations of time slots and wavelengths are necessary, a reservation and

corresponding time-slot/wavelength selection need to be made in the forward direction of

call setup.

36

3.2.3.2 MESSAGE_ID/MESSAGE_ID_ACK object

The MESSAGE_ID and MESSAGE_ID_ACK objects were defined in [40] to support

reliable transmission of RSVP messages, the former with Class-Num 23, C-Type 1, the lat-

ter with Class-Num 24, C-Type 1, as shown in Fig. 24.

RSVP-TE runs on top of connectionless, unreliable IP protocol. In [40], a TCP-like

mechanism is introduced to provide reliable message transmission. This mechanism

includes MESSAGE_ID, MESSAGE_ID_ACK, and exponential back-off retransmission

algorithm. Unlike TCP, there is no window-based flow control.

This mechanism is defined on a per hop basis. Each RSVP message (Path, Resv, Path-

Tear/ResvTear, etc.) includes a MESSAGE_ID object. This MESSAGE_ID object, together

with the message generator’s IP address, uniquely identifies a message. When a message

is received, the contents of the MESSAGE_ID object are copied to a MESSAGE_ID_ACK

object, the MESSAGE_ID_ACK object is then sent back to the message generator as an

acknowledgement. The MESSAGE_ID_ACK object can be piggy-backed in another RSVP

message, or carried in a separate Ack message. Multiple MESSAGE_ID_ACKs can be car-

ried in one RSVP message. If the message generator fails to get back the

MESSAGE_ID_ACK before a re-transmission timer expires, the message generator will

re-transmit the message. The time-out value of the re-transmission timer doubles per re-

transmission (exponential back-off).

Length Class-Num (23/24) C-Type (1)

Flags Epoch

Message_Identifier

Figure 24. MESSAGE_ID/MESSAGE_ID_ACK object.

37

3.2.4 Connection setup and teardown with RSVP-TE signaling messages

Similarly to Fig. 1, Fig. 25 shows the procedure to set up an end-to-end unidirectional

connection with RSVP-TE signaling messages. Path messages, carrying the destination

address, the required traffic parameters, and other message fields, progress from the

source to destination hop-by-hop, and Resv messages travel in the reverse direction, back

to the source. The same five steps as mentioned in Chapter 1 are carried out at each

switch. Since we support SUGGESTED_LABEL object in our subset, the first three steps

are performed in the forward direction, i.e., when Path messages are processed. Upon

completion of data exchange, either source or destination can release the connection. The

source node can initiate the release procedure with PathTear messages and the destination

node can do the same with ResvTear messages. In RSVP-TE, the release procedure is not

confirmed.

SwitchSW4

SwitchSW5

SwitchSW1 Switch

SW2

SwitchSW3

Source Destination

1. Path2. Path 3. Path

4. Path

5. Tear6. Tear7. Tear8. Tear

9. Connection (circuit orvirtual circuit) established

Figure 25. Illustration of connection setup with RSVP-TE messages.

38

Chapter 4 RSVP-TE Procedures for Hardware-

Acceleration

In Chapter 3, we extracted a subset of RSVP-TE for hardware acceleration. This sub-

set is described in a language of communication protocols. In this chapter, we describe the

procedures needed to implement this subset of RSVP-TE. Registers and data tables used

in the hardware implementation are defined.

Four types of messages are handled in hardware, Path, Resv, PathTear, and ResvTear.

As each message is received, it is parsed out and different fields of the objects are dis-

patched to corresponding registers. We list these registers in Section 4.1. Processing of

signaling messages involves reading and writing data tables. We describe these data tables

in Section 4.2. In Section 4.3, we use Path message as an example, illustrate the detailed

procedures to process a typical RSVP-TE signaling message.

4.1 Registers

Registers are commonly used in generic micro-processors, DSPs, ASICs, and other

digital VLSIs, to store data temporarily. We define registers to buffer message fields,

and intermediate data. We also define configuration registers.

Table 2. Data registers.

Register Width(bit) Description

Msg_Type 8 Message type, Msg Type in Common HeaderMsg_Len 16 Message length, RSVP length in Common HeaderLocal_Chksum 16 Locally calculated checksumSend_TTL 8 Send TTL, Send_TTL in Common HeaderSrc_IP_Addr 32 Source IP address, IPv4 tunnel sender address in

SENDER_TEMPLATE object within Path message or FILTER_SPEC object within Resv message.

36

8

39

LSP_ID 16 LSP ID in SENDER_TEMPLATE object within Path message or FILTER_SPEC object within Resv message.

Dest_IP_Addr 32 Destination IP address, IPv4 tunnel end point address in SES-SION object within Path/Resv message

Tunnel_ID 16 Tunnel ID in SESSION object within Path/Resv messageExt_Tunnel_ID 32 Extended tunnel ID in SESSION object within Path/Resv messagePre_IP_Addr_Ctrl 32 Previous IP address on control plane, IPv4 Previous Hop Address

in RSVP_HOP object within Path messageNext_IP_Addr_Ctrl 32 Next IP address on control plane, the result of User/Control Map-

ping table look-upLIH 32 Logical Interface Handle in RSVP_HOP object within Path mes-

sagePre_IP_Addr_User 32 Previous IP address on user plane, IP Address in IF-INDEX TLV

within Path messagePre_IF_ID_User 32 Upstream node output interface ID, Interface ID in IF-INDEX

TLV within Path messageNext_IP_Addr_User 32 Next hop IP address, the result of Routing table look-upSeq_Num 4 Sequence number of the links between two neighboring LSRs.Incoming_IF_ID_User 32 Incoming interface ID, the result of Incoming Connectivity table

look-upOutgoing_IF_ID_User 32 Outgoing interface ID, the result of Outgoing Connectivity/CAC

table look-upIncoming_PHY_ID 8 Incoming physical interface IDOutgoing_PHY_ID 8 Outgoing physical interface IDIncoming_Assigned_Timeslots

12 time-slots assigned to a certain incoming logical link

Outgoing_Assigned_Timeslots

12 time-slots assigned to a certain outgoing logical link

Incoming_Label(s) 32 Generalized label(s) for the incoming interface, locally allocatedOutgoing_Label(s) 32 Generalized label(s) for the outgoing interface, received from

downstream LSRSignal_Type 8 Signal Type in SENDER_TSPEC object within Path message or

FLOWSPEC object within Resv messageRCC 8 Requested Contiguous Concatenation in SENDER_TSPEC object

within Path message or FLOWSPEC object within Resv messageNCC 16 Number of Contiguous Concatenation in SENDER_TSPEC

object within Path message or FLOWSPEC object within Resv message

NVC 16 Number of Virtual Concatenation in SENDER_TSPEC object within Path message or FLOWSPEC object within Resv message

MT 16 Multiplier in SENDER_TSPEC object within Path message



40

4.2 Data tables

Fig. 26 is an illustrative network that shows: (i) switches can be connected via many

State 4 Current state of an LSPAvail_BW 12 Available bandwidth, the result of Outgoing CAC table look-upLSP_Enc_Type 8 LSP Encoding Type in LABEL_REQUEST object within Path

messageSwitching_Type 8 Switching Type in LABEL_REQUEST object within Path mes-

sageG-PID 16 Generalized PID in LABEL_REQUEST object within Path mes-

sageRefresh_Period 32 Refresh Period in TIME_VALUES object within Path messageStyle 5 Option vector in STYLE object within Resv message

Table 3. Configuration registers.

Registers Width (bits)

Init value Meaning

Init_LSP_Enc_Type 8 5 SDH ITU-T G.707/SONET ANSI T1.105Init_Switching_Type 8 100 Time-Division-Multiplex Capable (TDM)Local_IP_Addr_Ctrl 32 - Local IP address on the control planeLocal_IP_Addr_User 32 - Local IP address on the user planeLocal_MAC_Addr_ctrl 48 - Local Ethernet MAC address on the control planeSupported_ST 8 1 The cross-connect rate of the switch - 1 means OC1TO 32 - The initial time-out value of the re-transmission timerPTO 32 - The time-out value of the piggy-back timer



Figure 26. Network used for illustrative examples.

Network of IP routers (not necessarily the Internet;could be a specially designed IP network just forsignaling traffic)

Switch

Signalingengine

Switch II

Signalingengine

Client

Switch I

Signalingengine

Switch III

Signalingengine

Logical link

Server

Server-to-clientSONETunidirectionalcircuit

1 234

6 12

34 5

7

Physical interface: 5 Physical interface: 5

6

Physicalinterface: 7

Physicalinterface: 2

41

user-plane interfaces (ii) logical links can be provisioned between non-adjacent switches

(iii) a signaling engine may have many signaling links leading to the connectionless sig-

naling network (IP network). We will extensively refer to this figure when we discuss the

data tables.

4.2.1 Routing table

The Routing table, as shown in Table 4, is initialized and maintained by a software

routing module (for example, an implementation of OSPF-TE for GMPLS). It is read-only

to the hardware signaling accelerator. The Next_IP_Addr_User is the user-plane IP

address for the next hop switch through which the destination IP address can be reached.

Look-up of routing table is a longest prefix match. In general, a routing table can have

alternative next hop switches. However, we do not implement multiple routing table look-

ups in the hardware signaling accelerator for simplicity reasons. It attempts only one

route; if resources are not available, it will pass off the message to the software signaling

process. For a subsequent release, we will re-consider this design decision.

4.2.2 Incoming Connectivity table

The Incoming Connectivity table, as shown in Table 5, is initialized and updated by a

software link management module. Similar to the Routing table, it is read-only to the hard-

ware signaling accelerator. It shows how the interface number used by a neighbor maps to

Table 4. Routing table.

Index Return value

Network_Prefix Subnet_Mask Next_IP_Addr_User

Table 5. Incoming Connectivity table.

Index Return value

Pre_IP_Addr_User Pre_IF_ID_User Incoming_IF_ID_User

Incoming_PHY_ID Incoming_Assigned_Timeslots

42

an incoming interface number and the corresponding physical interface. The incoming

assigned time-slots column is a bit-vector with ‘1’s for the time-slots assigned to this logi-

cal interface and ‘0’s for other time-slots.

Using the example shown in Fig. 26, the incoming connectivity table at switch II will

have entries as shown in Table 6. Here we assume that time-slots 1-3 of the physical inter-

face 5 are terminated at switch II while the remaining time-slots 4-12 (we assume all inter-

faces are STS-12s) are used for the logical link passing through switch II to terminate at

switch III. On the other hand, we assume that all 12 time-slots of physical interface 2 ter-

minate at switch II. In cases where all time-slots of a physical interface terminate at a

switch, we use the same number for physical interface as for the interface ID used in sig-

naling messages. Otherwise, a logical interface number that maps to a physical interface

and time-slots is used in signaling messages to identify logical links.

4.2.3 Outgoing Connectivity table

The Outgoing Connectivity table, Table 7, shows the outgoing interfaces leading to

neighboring switches. It also maintains mapping of logical interface IDs to physical inter-

face IDs and corresponding time-slots. Since a switch may have multiple links to a neigh-

bor, there may be many rows in this data table corresponding to a single

Table 6. Incoming Connectivity table at switch II in Fig. 26.

Index Return value

Pre_IP_Addr_User Pre_IF_ID_User Incoming_IF_ID_User

Incoming_PHY_ID

Incoming_Assigned_Timeslots

Switch I IP address 4 2 2 1111 1111 1111Switch I IP address 6 1 5 1110 0000 0000

Table 7. Outgoing Connectivity table.

Index Return value

Next_IP_Addr_User Seq_Num Outgoing_IF_ID_User

Outgoing_PHY_ID


43

Next_IP_Addr_User. Hence we have the Seq_Num column allowing the hardware signal-

ing accelerator to search this table multiple times until it finds an interface to the neighbor-

ing switch with sufficient available resources or exhausts all interfaces to the neighboring

switches.

An example Outgoing Connectivity table for switch I in Fig. 26 is shown in Table 8.

Time-slots 1-3 on physical interface 5 at switch I terminate at switch II while the remain-

ing are routed through as a logical link to switch III.

4.2.4 Outgoing CAC table

The Outgoing CAC table, Table 9, maintains the available time-slots for each physical

outgoing interface. Table 10 shows an example Outgoing CAC table.

Avail_BW is a bit-map of the available time-slots on a certain outgoing interface. ‘1’

means the corresponding time-slot is available while ‘0’ means it is not available. Fig. 27

shows an example of Avail_BW. A request for STS-1 can be satisfied but a request for

Table 8. Outgoing Connectivity table for switch I in Fig. 26.

Index Return valueNext_IP_Addr_User Seq_Num Outgoing_IF

_ID_UserOutgoing_PHY_ID


Switch II IP address 0 6 5 1110 0000 0000Switch II IP address 1 4 4 1111 1111 1111Switch III IP address 0 7 5 0001 1111 1111

Table 9. Outgoing CAC table.

Index Return value

Outgoing_PHY_ID Avail_BW

Table 10. Outgoing CAC table for switch I in Fig. 26.

Index Return value

Outgoing_PHY_ID Avail_BW5 1111 1111 10004 0110 0011 11115 1111 1111 1111

44

STS-3c cannot, because even though there are 8 time-slots available, the contiguous con-

catenation requirement cannot be satisfied. “Reserve time-slots” means setting the corre-

sponding bits to “0.”

4.2.5 User/Control Mapping table

Unlike packet-switched MPLS networks, where implicit in-band signaling is used and

there is one-to-one association between control channels and data links, circuit-switched

networks may require out-of-band signaling and hence a separation of control channels

from corresponding data links. The User/Control Mapping table, Table 11, is used to map

user plane IP addresses of neighboring switches to the corresponding control plane IP

addresses.

Our implementation supports multiple data links between two neighboring LSRs, as

shown in Fig. 26. A switch may also have multiple control plane interfaces as shown in

Fig. 26 for switch II. We assume that load balancing of signaling load across multiple con-

trol plane interfaces will be done outside the signaling module and appropriate data down-

loaded to this User/Control Mapping table. We assume that there is only one

Next_IP_Addr_Ctrl associated with each next-hop switch.

Table 11. User/Control Mapping table.

Index Return value

Next_IP_Addr_User Next_IP_Addr_Ctrl

Figure 27. Example of Avail_BW.

1 1 0 1 1 0 1 1 0 1 1 0

1: Available0: Not available (reserved)

45

4.2.6 State table

The State table, as shown in Table 12, is the most complex of all the data tables. The

State table consists of two parts: the index part, which includes the Src_IP_Addr, LSP_ID,

Dest_IP_Addr, Tunnel_ID, Ext_Tunnel_ID, and uniquely identifies an LSP, and a return

value part, which consists of many parameters about the LSP.

4.3 Procedures

After an RSVP-TE signaling message is parsed, fields from within objects are sent to

the registers defined in Section 4.1. The parsed message is then processed according to its

message type. In this process, the data tables defined in Section 4.2 are referred and

updated. In this section, we use Path message as an example to detail the processing of an

RSVP-TE signaling message. Inside the Path message, we focus on the processing of

SESSION object.

Fig. 28 shows the processing of common header. After a message is completely

received, the checksum and the protocol version are verified. If no error detected, the

fields in the common header are saved in registers. Then the message is further processed

according to the decoded message type. In this example, a Path message is detected

Table 12. State table.

Index (Global connection reference)Return value

Control plane Data plane

Src_IP_Addr

LSP_ID

Dest_IP_Addr

Tunnel_ID

Ext_Tunnel_ID

Pre_IP_Addr_Ctrl

LIH Next_IP_Addr_Ctrl

Pre_IP_Addr_User

Pre_IF_ID_User

Incoming_Assigned_Timeslots


Return value (con’t)

Data plane (con’t)

Incoming_IF_ID_User

Incoming_PHY_ID

Incoming_Label(s)

Next_IP_Addr_User

Outgoing_IF_ID_User

Outgoing_PHY_ID

Outgoing_Label(s)

Traffic_Spec

State

46

(Msg_Type=1).

The body of a Path message consists of several self-contained objects. Fig. 29 shows

the processing of object header. The Class-Num, C-Type, and Length of the object are

extracted. The object is further processed based on its type. In this example, it is a SES-

SION object (Class-Num=1).

Figure 28. Processing of Common header.

Message arrives

Local_Chksum=RSVP Checksum?

No

Yes

Msg_Type<－ Msg TypeMsg_Len<－ RSVP LengthSend_TTL<－ (Send_TTL-1)

Msg_Type=? Other values

Path message Resv message PathTear message ResvTear message1 2 5 6

Calculate Local_Chksum

Vers=1? No

Yes

Figure 29. Processing of Path message.

Path message

SESSION RSVP_HOP TIME_VALUES

1 3 5 1911 12

LABEL_REQUESTSENDER_TEMPLATE SENDER_TSPEC

Obj_Class<－ Class-NumObj_Type<－ C-TypeObj_Len<－ Length

Obj_Class=?Other values

47

Fig. 30 shows the processing of SESSION object. First the C-Type is verified, only C-

Type 7 (IPv4 tunnel) is supported in the subset we defined. Then the fields inside the

object are extracted and sent to corresponding registers. The next step is to look-up the

Routing table with the given destination IP address. If a match is found, the returned value

(the next-hop IP address) is used to search the Outgoing Connectivity table to find the cor-

responding outgoing interface. Then the Outgoing CAC table is referred to get the avail-

able bandwidth on that interface. With the bandwidth information fetched from the CAC

table and the required traffic parameters extracted from SENDER_TSPEC object, band-

width (time-slots) reservation is attempted. If it succeeds, the reserved time-slots are

marked, the Outgoing CAC table is updated, and new SUGGESTED_LABEL object is cre-

ated.

48

As mentioned in Chapter 1, setting up a connection involves five steps at each switch.

The first three steps, i.e., determining the next-hop switch, checking for the availability of

and reserving required resources, and assigning “labels” for the connection, are accom-

plished when a Path message is processed.

Figure 30. Processing of SESSION object.

SESSION

C-Type supported? No

Dest_IP_Addr<－ IPv4 tunnel end point addressTunnel_ID<－ Tunnel IDExt_Tunnel_ID<－ Extended Tunnel ID

F(State table, <Dest_IP_Addr,Tunnel_ID, Ext_Tunnel_ID,Src_IP_Addr, LSP_ID>)?

yes

No

Next_IP_Addr_User<－ F(Routing table, Dest_IP_Addr)

(Outgoing_IF_ID_User, Outgoing_PHY_ID, outgoing_Assigned_Timeslots)<－ F(Outgoing Connectivity table,<Next_IP_Addr_User, Seq_Num>)

Find next hop? No

Avail_BW* satisfies<Signal_Type, MT, NCC>?

Yes

Update Avail_BW (Set corresponding bits to 0's)Update Outgoing CAC table (Write back Avail_BW)Assign Suggested_Label

Done

From SENDER_TSPEC

Yes

Find a match?

Yes

No

Avail_BW<－ F(Outgoing CAC table, Outgoing_PHY_ID)

yes (7)

From SENDER_TEMPLATE

49

After all objects are successfully processed, the State table is updated, a new Path

message is assembled and transmitted to the next-hop address (control plane), as shown in

Fig. 31.

All objectsdone?

No

Update State table

Assemble new Pathmessage

Transmit to next-hop

Figure 31. At the end of Path message processing.

50

Chapter 5 An FPGA-Based Hardware Signaling

Accelerator

As mentioned before, complexity and the requirement for flexibility are two chal-

lenges for hardware implementations of signaling protocols. In Chapters 3 and 4, we

defined a subset of RSVP-TE for hardware acceleration, with the part beyond this subset

relegated to software. This approach can partly solve the challenge of complexity.

In order to address the challenge of flexibility, we propose to use re-configurable hard-

ware, i.e., FPGAs, as the hardware platform for our RSVP-TE implementation. These

devices are a compromise between general-purpose processors used in software imple-

mentations at one end of the flexibility-performance spectrum, and ASICs at the opposite

end of this spectrum. The latest FPGA devices, such as Xilinx Virtex-II [41][42], even

support runtime partial re-configuration [43]. We can re-configure FPGAs with updated

versions of implementations as signaling protocols evolve while significantly improving

the performance relative to software implementations on general-purpose processors.

Based on the subset defined in Chapter 3 and the detailed descriptions given in

Chapter 4, we discuss an FPGA based hardware signaling accelerator in this chapter. In

Section 5.1 we give an architecture-level view of the hardware signaling accelerator. We

then detail various functional modules Section 5.2. We present the implementation and

simulation results in Section 5.3. Finally, we discuss the issue of re-configurability in

Section 5.4.

51

5.1 The architecture of the hardware signaling accelerator

Fig. 32 illustrates the architecture of the hardware signaling accelerator. It consists of

three stages, message parsing, message processing, and message assembling. In the mes-

sage parsing stage, signaling messages are buffered in incoming message buffer for check-

sum verification. Meanwhile, message objects are checked and fields are dispatched to

different registers in the register bank. In the message processing stage, message objects

are processed in parallel. Finally, appropriate objects are re-assembled into a new message

in the message assembling stage. The new message is then sent to the next switch. These

three stages are fully pipelined to achieve high throughput.

5.2 Functional modules

In this section, we discuss the functional modules illustrated in Fig. 32.

5.2.1 Register bank

At the center of Fig. 32 we show a register bank. In Chapter 4, we described the regis-

ters that are necessary for our RSVP-TE implementation. Since message parsing, message

processing, and message assembling are fully pipelined, each stage must be equipped with

Figure 32. Architecture of the hardware signaling accelerator.

MessageAssembler

ObjectDispatcher

IncomingMsg Buf

PCI Local Bus Interface

ResourceManagement

Data TableManagement

RegisterBank

CAC Table

RetransmissionManagement

GbE

Inte

rfac

e

Message Parsing Message Processing

Message Assembling

FIFO Interface Cross-connect Interface

TCA

M &

SRAM

IntefacesSRA

M I/F

52

a set of registers. These three sets of registers form the register bank.

Unlike in a regular pipelining scheme where each stage is attached to a fixed set of

registers, we propose a dynamic round-robin style pipelining scheme, as shown in Fig. 33.

In this scheme, a message, instead of a pipelining stage, is associated with a set of regis-

ters. Each register set has 3 flag bits indicating the pipelining stage of the register set and

associated signaling message. Similarly, each pipelining stage has 3 flag bits indicating

the register set associated with the message it is currently handling. When a message

enters the next pipelining stage, the associated flag bits change in a round-robin style. The

flag bits for each pipelining stage change in a similar round-robin style, but in the reverse

direction. For example, in Fig. 33, register sets 1, 2, and 3 are associated with three signal-

ing messages, which are in message assembling, message parsing, and message process-

Figure 33. Dynamic round-robin style pipelining.

MessageDispatching

MessageProcessing

MessageAssembling

Data flow arbitrator

Reg.Set 1

Reg.Set 2

Reg.Set 3

Register Bank

(a) Regular pipelining.

(b) Round-robin style pipelining.

53

ing stages respectively. When all stages are done with current messages, register sets enter

next stage, all flag bits are then updated.

All flag bits are connected to a data flow arbitrator, which controls the bidirectional

data flows between the register bank and three message handling units. Inside the data

flow arbitrator, there are 3 flag bits indicating the status of the three pipelining stages.

Only when all three stages are ready, the round-robin style flag bits rotate.

This scheme avoids data transfers between stages. It thus helps to reduce processing

delay and to increase throughput. The cost of this scheme is the extra arbitration logic and

flag bits.

5.2.2 Input message buffering system

In Fig. 32, parallel to the object dispatcher is an incoming message buffer. When a

message is received and dispatched to registers, it is simultaneously sent to this buffer,

which is a dual-port RAM inside the FPGA. The message is held in the buffer until it is

successfully processed, at which point it is flushed out. If any error occurs in the message

parsing or processing stage, such as the presence of unknown message/object/field, rout-

ing table look-up failure, resource reservation failure, the message is moved to an external

FIFO. The internal dual-port RAM, the external FIFO, and the related message buffer

Figure 34. Two-level of input message buffering.

(Dual-Port RAM)

256 x 32 InternalMessage Buffer

ObjectDispatcher

Message arrives

Msg BufMgmt. Module

XC2V3000 FPGA

64K x 32 ExternalMessage Buffer

(FIFO)

54

management module (also inside the FPGA) constitute a two-level message buffering sys-

tem, as shown in Fig. 34. The internal dual-port RAM buffers the messages in dispatching

and processing stages, while the external FIFO stores the messages that cannot be handled

by hardware signaling accelerator and need software intervention. The external FIFO,

which we will discuss in Chapter 6, works as an interface between the hardware signaling

accelerator and the software signaling process.

The internal dual-port RAM makes use of the BlockRAM resource in the Virtex II

FPGA device. The memory space is equally partitioned into four segments. Each segment

is 256-byte ( ), large enough to contain a typical RSVP-TE message. Although

at a given instant, there are only two messages in the parsing and processing stages, there

could be at most two error messages waiting to be transferred to the external FIFO. There-

fore the internal message buffer has four segments. The messages in the buffer are aligned

to 256-byte segments instead of being stored continuously. This approach simplifies

buffer management by avoiding buffer fragmentation.

Each segment can be in one of four possible states, Free, in parsing stage (S1), in pro-

cessing stage (S2), or Error. Correspondingly, we assign four flag bits to each segment, as

Figure 35. Buffer management module.

ErrorS2S1Free

0 63 255

123 0

023

1

4-by

teSegment 0 Segment 1 Segment 2 Segment 3

64 32-bit×

55

shown in Fig. 35. Although the flag bits are physically associated with each buffer seg-

ment, the Free bits and the Error bits from all segments are logically organized into two

separate queues. The “Free” segment queue indicates which segment is available to

accommodate a new message. The “Error” segment queue indicates the segments waiting

to be moved to the external FIFO. The queue pointers advance in round-robin style. In

Fig. 35, The message buffer management module maintains the flag bits, queues, and con-

trol the write/read operation to/from the internal dual-port RAM, and the write operation

to the external FIFO.

5.2.3 Two-level object dispatching

The TLV structure offers the flexibility required to introduce new objects as needed,

which is essential to protocols that keep evolving, like RSVP-TE. However, it impacts

message parsing for parameter extraction adversely, making it difficult to implement in

hardware. For example, when processing an IP packet header, the hardware can always

extract the 5th word from the IP header to obtain the destination IP address. But in RSVP-

TE, since the SESSION object carrying the destination IP address can occur anywhere in a

Figure 36. TLV style object processing.

Object dispatcher

SESSIONfield

dispatcher

RSVP_HOPfield

dispatcher

Unknownobject

processor

Two levels of dispatchers

Distributeddecoding

56

message, the hardware cannot extract the destination IP address from a fixed location.

To address the challenge imposed by the flexible TLV structure, we propose a solution

with two-level dispatching and distributed decoding. Fig. 36 illustrates this solution. It is

the RTL circuit generated by the Synplify synthesis tool from a VHDL model of our solu-

tion. There are two levels of dispatchers, a object dispatcher, and nine field dispatchers for

each object type. The object dispatcher mainly consists of two registers and two counters.

The registers store the lengths of the message and the object being read. Two counters are

used to count the number of words received in an incoming message, one for the whole

message and the other for the current object. The message length counter and the message

length register are used to delimit a message. Similarly, objects are delimited according to

the object length register and the object length counter. The delimited object is sent to all

field dispatchers, while only the field dispatcher matching the object type is triggered (dis-

tributed decoding). The fields in the matched object are then dispatched into correspond-

ing registers. The unknown object processor captures all unsupported objects. If a

message contains such an object (i.e., one outside the set of objects defined in the RSVP-

TE subset), the message should be passed to the software signaling process.

5.2.4 Data table management

The processing of an RSVP-TE signaling message involves accessing multiple data

tables. Among the six data tables we defined in Chapter 4, the Outgoing CAC table, which

we will discuss in the next section, is located inside the FPGA completely. All other data

tables, including the Routing table, Incoming Connectivity table, Outgoing Connectivity

table, User/Control Mapping table, and State table, reside in an external Ternary Content

Addressable Memory (TCAM) device. Some data tables, such as the Routing table, User/

57

Control Mapping table, and State table, have extra data stored in an external Static RAM

(SRAM) device associated with the TCAM. The organization of these data tables inside

the TCAM and SRAM devices will be discussed in Chapter 6. Since multiple data tables

share the same TCAM and SRAM devices, and since in the message processing stage, dif-

ferent objects are processed concurrently possibly requesting simultaneous access to dif-

ferent data tables, a data table management module is designed to arbitrate and sequence

the requests for different data tables.

The data table management module in the FPGA (see Fig. 32) implements a priority

arbitrator. We have three guidelines to set the priority. First, if the look-up of a data table

results in data that is used for look-ups in other data table, the former has higher priority.

Fig. 37 shows the dependencies among the data tables. From Fig. 37 we can conclude that

Routing table has higher priority over User/Control Mapping table and Outgoing Connec-

tivity table. Second, if the look-up of a data table has follow-up look-ups while another

data table does not, the former has higher priority. For this reason, Routing table has

higher priority over Incoming Connectivity table, in which case no other table involved

before finally updating the State table. Third, if a data table has more follow-up opera-

tions, it has higher priority. For this reason, Outgoing Connectivity table has higher prior-

State table(Read)

Routing table IncomingConnectivity table

OutgoingConnectivity table

User/ControlMapping table

State table(Write)

Figure 37. Dependencies among the data tables.

58

ity over User/Control Mapping table. There is no further operation after the access of

User/Control Mapping table, while the time-consuming resource-reservation operation

follows the access of Outgoing Connectivity table.

Fig. 38 shows the priority arbitrator for the TCAM device. It has two interfaces. On

the left side, it interfaces to the other part of the hardware signaling accelerator, accepting

requests and sending out grant signals. Based on the priority, at any moment only one of

the requests can be satisfied. On the right side, it interfaces to the TCAM device. If there is

any request for a data table, the TCAM_Req is active. Since multiple data tables reside in

the same TCAM device, and typically a TCAM device is partitioned into segments, the

data tables can be naturally fitted into different segments. The table index indicates the

segment of the TCAM in which each data table resides; it does not necessarily reflect the

priority. The TCAM interface is very complex, and we only describe the priority arbitrator

related signals in Fig. 38.

5.2.5 Resource (bandwidth) management

Signaling protocols are used in connection-oriented networks to set up and tear down

connections. As mentioned in Chapter 1, five steps are involved at each switch to set up a

connection. After the first step, i.e., route look-up, a switch checks the availability of

resources (bandwidth and optionally buffer space) on the determined output interface. If

PriorityArbitrator

ReqGntState table

ReqRouting table Gnt

OutgoingConnectivity table Gnt

Req

IncomingConnectivity table Gnt

Req

User/ControlMapping table Gnt

Req

High

Low

Prio

rity

TCAM_Req

Table_Idx

Table Index State table 000 Routing table 010 Outgoing Conn table 011 Incoming Conn table 100 User/Ctrl Mapping table 101

Figure 38. Priority arbitrator for TCAM.

59

resources are available, the switch reserves the required resources. When tearing down a

connection, the switch releases the allocated resources.

The hardware signaling accelerator targets SONET switches, in which case

“resources” are the time-slots on the output interface. Fig. 39 shows the resource manage-

ment module implemented as part of the hardware signaling accelerator. It consists of a

CAC table and a bandwidth allocator. The CAC table maintains the available time-slots on

each output interface. Our target switch fabric has 64 output interfaces (see Chapter 6),

each with 12 time-slots (STS-12 interface). Therefore the CAC table has 64 entries, each

with a bit-vector of 12 bits. A bit value “1” indicates that the corresponding time-slot is

available or otherwise not available. The CAC table is small enough to fit into on-chip

memory in the FPGA. This is also desirable since the CAC table is tightly coupled with the

bandwidth allocator, in the sense that each time the CAC table needs to be accessed twice:

the bit-vector is first read out from the CAC table, after resource reservation or releasing,

the updated bit-vector is then written back to the CAC table.

A hierarchical bandwidth allocator, which consists of a STS-1 designator, a STS-3c

designator, and a STS-12c designator, is designed because of the hierarchy and concatena-

tion features of SONET signals. There are four 3-input AND gates between STS-1 desig-

nator and STS-3c designator and one 4-input AND gate between STS-3c designator and

STS-12c designator. A priority decoder (MSB has the highest priority) is used to select

Figure 39. Resource management module.

000000000001

111111 0 0 1 1 0 1 1 1 1 1 1 1 STS-1

0 0 1 1

0

STS-3c

STS-12c

CAC table Bandwidth Allocator

011001&

&

& & &

60

time-slots as per request. For example, if there is a request for 3 STS-1s on output inter-

face 25 (“011001”), we first get the entry corresponding to output interface 25 in the CAC

table. The entry is then copied to the bandwidth allocator. The designator corresponding to

STS-1 is checked and three time-slots are reserved, as shown in Fig. 40a. If the request is

for an STS-3c, the STS-3c designator is checked and time-slots reserved. If an STS-12c is

requested, the STS-12c designator determines that this request cannot be satisfied. After

resource reservation, the updated bit-vector is written back to the CAC table.

A priority decoder offers the best opportunity that the contiguous concatenation at the

higher level of hierarchy will not be broken by allocating signals at the lower level. For

example, in Fig. 40b, an allocation scheme other than priority decoding is used, resulting

in no availability at the STS-3c level. A new request for STS-3c cannot be fulfilled,

though there are enough time-slots available.

5.2.6 Re-transmission management

Timers are required to support the solution proposed in RFC 2961 [40] for reliable

message transmission. As discussed in Chapter 3, this solution requires every message to

carry a MESSAGE_ID object, which is acknowledged by MESSAGE_ID_ACK object car-

ried in an Ack message or piggy-backed in a message in the reverse direction. If the

MESSAGE_ID_ACK is not received before a retransmission timer times out at the sender,

the message is re-transmitted. A second timer, which we call piggyback timer, is used to

Figure 40. Comparison of resource allocation schemes.

0 0 1 1 0 1 1 1 1 1 1 1

0 0 1 1

0

3 STS-10 0 0 0 0 0 1 1 1 1 1 1

0 0 1 1

0

MSB LSB0 0 1 1 0 1 1 1 1 1 1 1

0 0 1 1

0

MSB LSBMSB LSB

3 STS-10 0 1 1 0 1 1 0 0 0 1 1

0 0 0 0

0

MSB LSB

(a) Priority-decoder based resource allocator. (b) Round-robin style resource allocator.

STS-1

STS-3c

STS-12c

Before reservation Before reservationAfter reservation After reservation

61

hold MESSAGE_ID_ACK objects awaiting a message to be sent in the reverse direction to

avoid unnecessary Ack messages. An Ack message is generated only if this timer expires.

Fig. 41 illustrates the proposed re-transmission management scheme. The hardware

signaling accelerator maintains a system timer, , which provides system timing for all

other timers. On the transmitting side, when a signaling message is sent out, the message,

together with a time tag marking the transmitting time (the value at the moment the

message is transmitted), is also copied to the unacknowledged message buffer. The buffer

is composed of equal sized blocks, one block for one message, and organized as queue

(first in first out). Therefore the head of the queue always contains the oldest message. The

time tag of the head message is copied to the retransmission timer, . Assuming the

initial time-out value is (this value is kept in a register and set at initialization time),

when , the re-transmission timer times out, the message is retransmit-

ted, and copied to the buffer for the first re-transmission. The first re-transmission buffer is

organized in a similar way, with a retransmission timer, , and a time-out value of

(exponential back-off). According to [40], we support at most three re-transmissions.

On the receiving side, each received message must be acknowledged with a

Time tag

Neighbor 0 Neighbor kHead

Tail

Receiving sideTransmitting side

MessageTime tag

Head

Tail

timer0

Unacknowledgedmessage buffer

Buffer for the2nd retransmission

Tail

Head

MSG_ID_ACK

timer2 p0 pk

Buffer forMSG_ID_ACKs

Figure 41. Re-transmission management (buffers and timers).

tSYS

tSYS

timer0

TO

tSYS timer0 TO+≥

timer1

2TO

62

MESSAGE_ID_ACK object. The acknowledgment can be delayed (bounded by the piggy-

back time-out value) so that a MESSAGE_ID_ACK object can be piggybacked in a mes-

sage in the reverse direction, or multiple MESSAGE_ID_ACKs can be packed into one Ack

message. For this purpose, one buffer is allocated per neighbor. Each entry in buffer is a

MESSAGE_ID_ACK object destined for the neighbor . Associated with each entry is a

time tag indicating the time at which a message carrying the corresponding MESSAGE_ID

is received. Entries in each buffer are time ordered given the FIFO nature of these buffers.

For each neighbor , we maintain a piggyback timer , which keeps the time tag value

of the head entry. Assuming the piggyback time-out value is (this value is kept in a

register and set at initialization time), when , the piggyback timer times

out. If this happens or the buffer is full, an Ack message is generated to carry all

MESSAGE_ID_ACKs in the buffer. After these MESSAGE_ID_ACKs are flushed from the

buffer, the piggyback timer will be reset to the time tag value of the next available head

entry. If there is a message destined to neighbor before times out, the outstanding

MESSAGE_ID_ACKs for this neighbor can be piggybacked and sent with the message

(within, of course, the maximum message length limit).

The re-transmission timers and piggyback timers work in a similar way. However,

important differences do exist. There are three re-transmission timers (and associated

queues) in the system, each corresponding to one re-transmission attempt. The time-out

value doubles each time. On the other hand, the number of piggyback timers depends on

the number of neighbors, and each neighbor has a separate piggyback timer and queue.

The time-out values of these piggyback timers are the same.

i

i

i pi

PTO

tSYS pi PTO+≥

pi

i pi

63

5.3 Implementation and simulation results

As a proof of concept, we developed a prototype VHDL model for the hardware sig-

naling accelerator, used Synplify for synthesis and Xilinx ISE for placement and routing.

The implementation uses of the FPGA resources (Xilinx XC2V3000) without the

PCI core, which we plan to include on the FPGA. Table 13 shows the detailed implemen-

tation results.

We performed timing simulations of the hardware signaling accelerator using Model-

Sim simulator. Fig. 42-Fig. 44 show the simulation results for processing of Path, Resv,

and PathTear messages, respectively. Processing the Path message, which involves the

access and updating of the data tables, takes clock cycles. The time to receive a Path

message (a message is parsed while it is being received) is clock cycles, as is the time

to transmit the outgoing Path message (a message is transmitted while it is being assem-

bled). Receiving, processing, and transmitting all other messages each takes no more than

clock cycles. A detailed breakdown of the typical processing time for each message is

shown in Table 14.

Because the parsing, processing, and assembling stages are fully pipelined, idle cycles

are inserted if the stage is not ready. As a worst-case estimate, the total time for parsing,

Table 13. Implementation results.

Device PCI core Resource Eq. Gates Max freq.XC2V3000 w/o PCI 12% 360,000 90MHz

w/ PCI 21% 630,000 50MHz

Table 14. Clock Cycles to receive an RSVP-TE message.

Path Resv PathTear/ResvTear

Clock cycles

40 32 19

12%

37

40

40

64

processing, and assembling a single message consumes clock cycles, which is

microseconds with a clock. Since connection setup/release requires the handling

of three signaling messages, we require a total of microseconds per call. The call han-

dling capacity is as high as calls/sec because of pipelining, which allows the sys-

tem to accept a new message every clock cycles.

120 2.4

50MHz

7.2

400 000,

40

Figure 42. Processing of Path message.

Pathmessage

TCAMinterface

SRAMinterface

CACtable

Previousstage ready

Routingtable

IncomingConn table

OutgoingConn table

U2C Mappingtable

Statetable

End ofprocessing

Figure 43. Processing of Resv message.

Previousstage ready

TCAMinterface

SRAMinterface

Switch fabricinterface

Resvmessage

Read Statetable

Programswitch fabric

End ofprocessing

65

5.4 Re-configurability

The reason we choose FPGA as the hardware platform for the signaling accelerator is

its re-configurability. The FPGA device we use, Xilinx Virtex-II, can support runtime par-

tial re-configuration, meaning we can re-configure it even when it is working in the field

and we can selectively re-configure part of the device and keep the other part unchanged.

For example, our hardware signaling accelerator targets SONET switches. Specifi-

cally, it is designed for Vitesse 64x64 switch fabric, with STS-12 interfaces and a STS-1

cross-connect rate. Accordingly, the Incoming/Outgoing Connectivity tables each have 64

entries, corresponding to 64 interfaces. The CAC table also has 64 entries for 64 outgoing

interfaces, each with 12 bits for 12 time-slots in an STS-12 signal. The bandwidth alloca-

tor consists of 3-level hierarchy, the lowest level corresponding to STS-1 (the cross-con-

nect rate), the highest level corresponding to STS-12c (the maximum interface rate). If we

are going to support a 512x512 switch fabric with OC-192 interfaces and OC-3c cross-

connect rate (such as Sycamore® SN16000 [44]), we need to expand the Incoming/Outgo-

ing Connectivity tables, and CAC table to 512 entries. In CAC table, each entry has 64 bits,

Figure 44. Processing of PathTear message.

TCAMinterface

SRAMinterface

CACtable

PathTearmessage

Read Statetable

Release allocatedtimeslot

Previousstage ready

End ofprocessing

66

each bit corresponding to an OC-3c signal. We also need to re-design the bandwidth allo-

cator. The new bandwidth allocator consists of 4 levels, the lowest level corresponding to

the OC-3c cross-connect rate, then OC-12c, OC-48c, and OC-192c the highest level. All

other parts of the hardware signaling accelerator can keep unchanged.

A Xilinx Virtex-II FPGA can be re-configured in several milliseconds [43], which

makes hardware upgrading as easy as software upgrading.

67

Chapter 6 Design of a Prototype Switching System

In Chapter 1, we showed the unfolded view of a typical switching system. A complete

switching system includes both control-plane and user-plane devices. The user plane, car-

rying user traffic, typically consists of line cards terminating the incoming and outgoing

user traffic, and a switch fabric switching user traffic from an incoming interface to an

outgoing interface. The control-plane, on the other hand, controls the operation of the

user-plane. The control plane typically implements functionalities such as routing, signal-

ing, and link management.

In order to demonstrate a complete switching system equipped with our FPGA imple-

mentation of hardware signaling accelerator, we design a prototype switching system. It is

a PCI expansion board including both control-plane and user-plane devices. The control-

plane module implements the subset of RSVP-TE we defined in Chapter 3. The user-plane

device, a Vitesse 64x64 switch fabric, provides 64x64 switching capability under the con-

trol of the hardware signaling accelerator. The prototype switching system is connected to

a host computer through the PCI local bus. The host computer runs the software signaling

process. We have not implemented routing and link management modules in this switch-

ing system, but assume that these modules can be added as software modules on the host

computer. These modules initialize and maintain the Routing table and Incoming/Outgo-

ing Connectivity tables on the PCI board.

In this chapter, we discuss the design of the prototype switching system. We first sum-

marize the architecture of the prototype board, and then give the details of the main func-

tional devices. We also present the on-board clock and power distribution schemes, both

are essential for high-speed PCB board design.

68

6.1 Architecture of the PCI-Based prototype switching system

Fig. 45 shows the architecture of the proposed prototype switching system. It is a PCI

expansion board. The core component on the board is the hardware signaling accelerator,

which is a Xilinx XC2V3000 FPGA [41]-[43]. It implements the RSVP-TE subset

described in Chapters 3 and 4 and interfaces to the other devices on this board. We only

implement a single signaling channel, on which messages are received and transmitted.

This is a Gigabit Ethernet (GbE) interface. RSVP-TE messages are carried within IP

packet and arrive on this interface. Thus messages sent to different neighboring switches

can all be carried on the same GbE interface and switched through an IP router or Ethernet

switch to their appropriate destinations. More details on the selection of signaling chan-

nel(s) and the analysis of signaling transport mechanisms are given in Chapter 7. The opti-

cal transceiver, SerDes (Serializer/Deserializer), and GbE Media Access Control (MAC)

chips shown in Fig. 45 form the signaling channel.

Processing of RSVP-TE messages involves many data tables. These data tables can be

fairly large. Therefore we include a TCAM and an SRAM (SRAM0) on our prototype

board to hold these data tables. The data tables in these memory chips are initialized from

FPGA(XC2V3000)

Switch Fabric(VSC9182)

SRAM1(IDT71V2556)

SRAM0(IDT71V2556)

TCAM(IDT75P52100)

FIFO(IDT72V36100)

GbE MAC(L8104)SerDes

(HDMP-1636)

Clock OSC(CO43S)

Duplex SCConnector

PCI Connector

OpticalTransceiver

(HFCT-53D5)

Figure 45. Architecture of the prototype board.

69

the host system connected to this board through the PCI bus. The SRAM1 is used to buffer

the transmitted but as-yet unacknowledged messages. Specifically, we choose IDT®

IDT75P52100, a 64K x 72 Network Search Engine (NSE) device, as our TCAM, and

IDT® IDT71V2556, a 128K x 36 SRAM device.

The FIFO is the main interface unit between the hardware signaling accelerator and

the software signaling process. It is used to buffer signaling messages that cannot be han-

dled by hardware. If any error occurs in the message parsing or processing stage, the mes-

sage is queued in this FIFO. The software signaling process will fetch the messages from

the FIFO and process them as needed.

Finally, we include a SONET switch fabric as the user-plane device (note that in

Chapter 3, we stated that this RSVP-TE subset is being designed for such a SONET

switch). Specifically, we selected the Vitesse VSC9182 device, which is a 64x64 switch

fabric with STS-12 interfaces and an STS-1 cross-connect rate. In a full implementation,

the switch fabric will require its input and output ports to be connected to O/E and E/O

devices for termination of optical fiber links. Since our prototype board is focused on the

control-plane aspects of the switch, we do not test actual user data flow through this

device. Instead we include the user-plane device mainly to test whether or not our RSVP-

TE hardware signaling accelerator can perform the functions of setting up and deleting

cross-connections through the user-plane switch fabric device.

70

6.2 Main on-board devices

6.2.1 1 Gbps signaling channel [45][46][47]

The signaling channel consists of GbE MAC, SerDes, and optical transceiver,

as shown in Fig. 46. The optical transceiver, Agilent® HFCT-53D5, provides conversion

between electronic and optical signals. The SerDes, Agilent HDMP-1636A, converts

between the 8B/10B encoded parallel signals ( ) and high-speed serial signals

( ). Since the signals are 8B/10B encoded, the coding efficiency is , the

serial data rate is or ( ).

The LSI Logic® L8104 device is one of the most widely used GbE MAC controllers.

Figure 46. 1Gbps signaling channel.

HDMP-1636A

Transmitter section

PLL

Receiver section

PLL

L8104

RBC0RBC1

BYTSYNC

REFCLK

ENBYTSYNC

HFCT-53D5

TD-

TD+

RD-

RD+

TX[9:0]

RX[9:0]

1Gbps

125MHz

1250MHz 80%

1250MBd/sec 1000Mbps 1Gbps

Figure 47. Block diagram of L8104.

10-bitPHYI/F

SystemI/F

RX[9:0]RBC[1:0]

TX[9:0]TBC

TCLK

TransmitFIFO(4K)

ReceiveFIFO(16K)

RXD[31:0]RXENn

RXBE[3:0]RXSOFRXEOFRXWM1

TXD[31:0]TXENn

TXBE[3:0]TXSOFTXEOFTXWM1

Register Interface

MAC8B/10B

PCS

REG

CLK

REG

CSn

REG

D[1

5:0]

REG

D[7

:0]

REG

RD

nR

EGW

Rn

REG

INT

71

Fig. 47 shows the block diagram of L8104. Internally it consists of GbE MAC, 8B/10B

Physical Coding Sublayer (PCS), 16K-byte receive FIFO, and 4K-byte transmit FIFO.

Externally it has three interfaces, a 32-bit system interface connecting to the hardware sig-

naling accelerator (interface signals are given in Table 15), a 10-bit PHY interface con-

necting to the SerDes, and a 16-bit register interface connecting to FPGA for

configuration purpose.

In Chapter 5, the simulation results showed that the call handling rate of the hardware

signaling accelerator can be as high as calls/sec. This conclusion is based on a

working clock of , which is the maximum clock frequency of the hardware signal-

ing accelerator. In the prototype system, limited by the single signaling channel,

the call handling rate is less. For the 32-bit GbE interface, the working clock is

( ). Consequently, the call handling rate is calls/sec.

6.2.2 Incoming message buffer - FIFO device [48]

In Chapter 5, we introduced a two-level incoming message buffering system. We also

discussed the dual-port RAM and the related buffer management module inside the

FPGA. In this section, we discuss the external message buffer, i.e. the FIFO device, to

complete the discussion of this two-level incoming message buffering system. Compared

Table 15. L8104 system interface.

Transmit Interface Receive Interface

Signal I/O Description Signal I/O DescriptionTXD[31:0] I Transmit data RXD[31:0] O Receive dataTXBE[3:0] I Transmit byte enable RXBE[3:0] O Receive byte enableTXENn I Transmit enable RXENn I Receive enableTXSOFn I Transmit start of frame RXSOFn O Receive start of frameTXEOFn I Transmit end of frame RXEOFn O Receive end of frameTXWM1 O Transmit FIFO watermark 1 RXWM1 O Receive FIFO watermark 1

400 000,

50MHz

1Gbps

33MHz

32-bit 33MHz× 1Gbps= 250 000,

72

to the internal dual-port RAM, the external FIFO is much larger (64K x 36 compared to

256 x 32). And unlike the internal dual-port RAM, the external FIFO is not segmented.

Instead, the messages are stored in the FIFO continuously.

Fig. 48 shows the block diagram of IDT72V36100. It consists of a 64 x 36 RAM array,

a write pointer, a read pointer, and the related control logic and flag logic. It has two inde-

pendent data ports, an input port and an output port. Table 16 lists the main interface sig-

nals on both ports.

The input port interfaces with the dual-port RAM and buffer management module,

which are the part of the two-level message buffering system inside the FPGA. The

D[31:0] connects to the output of the internal dual-port RAM. The WEN indicates that

there is data on the input port waiting to be written. If there is an error message in the

internal dual-port RAM (recall the Error flag bits we defined), the WEN is active. The FF

indicates that the FIFO is full and cannot accommodate more messages.

Table 16. Main interface signals of IDT72V36100.

Input Port Output Port

Signal I/O Description Signal I/O DescriptionD[31:0] I Data input Q[31:0] O Data outputWCLK I Write clock RCLK I Read clockWEN I Write enable REN I Read enableFF O Full flag EF O Empty flagPAF O Programmable almost-full flag PAE O Programmable almost-empty flag

Write CtrlLogic

Read CtrlLogic

RAM Array64K x 36

Write ptr.

Out

put R

egis

ter

Inpu

t Reg

iste

r

Read ptr.

Flag Logic

D[31:0]

WCLKWEN

FFPAF

Q[31:0]

RCLKREN

EFPAE

Figure 48. Block diagram of IDT72V36100.

73

The output port interfaces with the PCI module residing inside the FPGA, through

which the software signaling process can fetch the messages in the FIFO. The software

module has two options to fetch messages: through interrupts or via polling. With inter-

rupts, the hardware signals the software if there are messages waiting to be fetched, and

the software responds upon request. With polling, the software polls the prototype board

periodically. Since we do not expect many messages to require software intervention, We

choose interrupt scheme instead of polling.

The Q[31:0] connects to the PCI module inside the FPGA. A register Q[31:0] inside

the FPGA is mapped to an address in the PCI memory space ( in our

design, BASE is the PCI card base memory address assigned by the operating system).

The software signaling process (including a device driver) can read messages from this

address. When the software signaling process is ready to process the interrupt from the

FIFO, the PCI module activates REN. The design of the interrupt-handling mechanism

needs further work. If the occurrence of messages needing software intervention is rare,

the EF signal can be used to trigger an interrupt. Whenever there is a message waiting in

the FIFO (EF is not active), an interrupt is generated. On the other hand, if the occurrence

of such messages is frequent, the rate of interrupts could be too high. As an alternative, we

can use PAF to trigger an interrupt. In PCI local bus, there is a parameter called PCI

Latency Timer. It determines the maximum number of clock cycles a PCI device can hold

the PCI bus. A typical value of the PCI Latency Timer is 32, which is the default value for

most main boards. The number of PCI devices in a personal computer system is typically

fewer than six. The maximum response time to an interrupt is thus 192 clock cycles

(assuming that the prototype board is the last PCI device polled and all other PCI devices

BASE + 0x00000010

74

use their full 32-clock cycle allocation. Therefore, we set the PAF threshold to 256 (larger

than 192). When the FIFO is almost full, i.e., there are only 256 or fewer spaces available,

the PAF signal will be active and trigger an interrupt.

6.2.3 Data tables - the TCAM and SRAM devices [49][50]

In Chapter 5, we discussed the data table management module inside the FPGA that

manages the data tables residing in the external TCAM and SRAM devices. In this sec-

tion, we first introduce the TCAM, SRAM devices we use, and then detail the organiza-

tion of data tables in the TCAM and SRAM devices.

6.2.3.1 The TCAM and SRAM devices

TCAM stands for Ternary CAM, where CAM stands for Content Addressable Mem-

ory. A CAM, like a RAM, is a memory device used to store data, though it works in a dif-

ferent manner. In a RAM, the user supplies the address, and gets back the data. To the

contrary, with a CAM, the user supplies the data and gets back the address. Fig. 49 com-

pares a RAM and a CAM. Typically, a CAM works in conjunction with a RAM. The

matched address obtained from the CAM look-up is used to fetch the data stored in the

associated RAM, as shown in Fig. 49b. This feature makes CAMs suitable for table look-

ups. For example, in our hardware signaling accelerator, to find the next hop correspond-

Figure 49. RAM and CAM.

0xABCD5678

0x0000

0xFFFF

0x0001

0xFFFE

0x7FFE

0xABCD5678

RAM

Find a match for0xABCD5678

0xABCD5678

0x0000

0xFFFF

0x0001

0xFFFE

0x7FFE Asso. data

CAM RAM

(a) Indexing a RAM (b) Looking up a CAM

Obtain the data storedat location 0x7FFE

75

ing to a destination IP address, we use the destination IP address for a CAM look-up and

then obtain the next address stored in an associated RAM device.

Unlike in a CAM, where each binary digit (or bit, in short) can hold only one of two

values: ‘0’ or ‘1’, in a TCAM, each ternary digit can hold one of three values, ‘0’, ‘1’, or

‘X’ (don’t care). In practice, a ternary digit is often represented by two binary digits (2

bits). Therefore a TCAM array typically consists of two memory arrays, a data array and a

mask array. A bit value ‘0’ in mask array indicates that the corresponding ternary digit is

‘X’ no matter what the bit value is in data array. On the other hand, if a bit value in mask

array is ‘1’, the corresponding ternary digit value is determined by the bit value in data

array. A TCAM digit of ‘X’ will be ignored when doing a content match.

Fig. 50 illustrates how TCAM works. When a search word, e.g., , is

given, it is first ANDed with the masks in the mask array, then compared to the whole data

array to find a match. If multiple matches are found (in this example we have 3 matches),

a priority decoder then selects the one with the highest priority (hence 0x1234 is the

matched address). TCAMs are especially suitable for IP routing table look-up, where

Longest Prefix Match (LPM) is required. The routing table is sorted according to the

length of the network prefix, the longest prefix entries are located at the top (highest prior-

ity). The network prefix is stored in the data array, and the subnet mask is stored in the

128.238.34.219

76

mask array. In this way, if there is a match, it is always the longest one.

In our prototype system, we choose IDT75P52100 and IDT71V2556 as our TCAM

and SRAM devices. The hardware signaling accelerator, TCAM, and SRAM are con-

nected as shown in Fig. 51. When there is a request for a data table (recall the priority arbi-

trator in the data table management module), the hardware signaling accelerator sends out

a request, together with appropriate command and data to the TCAM. The TCAM device

will send back an acknowledgment signal if there is a match. The matched address can be

used to fetch the extra data stored in the SRAM device. Some data tables, i.e. the Incoming

and Outgoing Connectivity tables, have no associated data stored in the SRAM. In this

case, part of the matched address is returned to the hardware signaling accelerator.

The command bus consists of three parts, as shown in Fig. 52. INST[2:0] specifies the

instruction type (“010” for look-up), LTIN[1:0] specifies the width of the look-up (“00”

for 72-bit look-up), and SEGSEL[2:0] specifies which Segment Select Register will be

Figure 50. How does TCAM work.

Mask array

255.255.255.0

128.238.34.219Search word

255.255.255.240

255.255.0.0

Data array

128.238.34.0

128.238.34.208

128.238.0.0

AND0x1234

Any matchin data array?

PrefixLong

Short

PrefixHigh

Low

Figure 51. FPGA/TCAM/SRAM configuration.

Hardware SignalingAccelerator

XC2V3000

ZBT SRAM

IDT71V2556

REQSTB

CMD Bus

REQDATA Bus

CE

WE

SRAM Data

HITACK

Matched Addr

TCAM

IDT75P52100

77

used for look-up. We will discuss LTIN[1:0] and SEGSEL[2:0] in more detail in the next

subsection.

In Fig. 53, we use Routing table look-up as an example to show a typical look-up oper-

ation. At time , the hardware signaling accelerator activates the REQSTB signal, indi-

cating that there is a look-up request. At the same time, it drives the look-up command to

the CMD bus, and the destination IP address to the REQDATA bus. Five clock cycles later

(CCLK, we will explain later that the frequency of the internal working clock is only the

half of the I/O clock, CLK2X), at time , the matched address appears on the ADR bus

connecting to the SRAM device. The TCAM device also drives the CE and R/W signals to

enable the SRAM read operation. Two clock cycles later, at time , the returned next-

hop IP address is available on the SRAM data bus. The HITACK signal indicates that it is

a successful look-up. The signals used for table look-up operation are summarized in

Table 17.

Although it takes seven clock cycles to finish a typical look-up operation, it is not nec-

essary to wait until the end of the previous look-up to start a new one, as long as there is

no dependency between two consecutive look-up operations. For example, if the current

look-up operation is conducted on the Outgoing Connectivity table, a look-up on the

Command Bus

INST[2:0]

00 72-bit lookup01 144-bit lookup10 288-bit lookup11 576-bit lookup

LTIN[1:0]

000 SegSel0001010011100101110111

SEGSEL[2:0]

1 1 1 1 1 1 1 1SegSel0 Register

First 8 segmentsfor Routing table

000001010 Lookup011100101110111

Figure 52. TCAM command bus.

t0

t10

t14

78

Incoming Connectivity table can start immediately after, then followed by a look-up on the

User/Control Mapping table in the next clock cycle.

6.2.3.2 The organization of the data tables

The TCAM device is typically split into segments for two important reasons. First,

each segment can be selectively turned off to reduce power consumption. Second, each

segment can be flexibly configured, and multiple segments can be combined, to accom-

modate tables with different widths and depths. This feature is especially useful when

multiple data tables with different widths and depths reside in the same TCAM device.

Table 17. Typical TCAM and SRAM signals.

Signal I/O Description

REQSTB I Request strobeCommand

BusINST I InstructionLTIN I look-up typeSEGSEL I Segment select

REQDATA I Request data bus

To SRAMADR O AddressCE O Chip enableR/W O Read/Write enable

HITACK O Match acknowledgeFrom SRAM IO[31:0] I/O Data from SRAM

Figure 53. 72-Bit look-up timing diagram.

t0 t1 t2 t3 t4 t5 t6 t9t7 t8 t10 t11 t12 t13 t14 t15 t16CLK2X

CCLK

REQSTB

INST

LTIN

SEGSEL

ADR

CE

SRAM DATA

HITACK

REQDATA

Matched Addr

Next Hop Addr

t17

010

00

000

Dest .IP Addr

CMDBus

R/W

79

The TCAM device we use, IDT75P52100, has sixteen independent segments, and five

data tables share these segments.

IDT75P52100 has eight Segment Select Registers (SSRs), each has 16 bits representing

16 segments. A bit value of ‘1’ indicates the corresponding segment is selected, and vice

versa. In each look-up command, the SEGSEL[2:0] signal specifies which SSR is used for

current table look-up operation, the LTIN[1:0] signal specifies the width of the data table.

The Routing table has entries, each a 32-bit IP address. In TCAM device, it occupies

eight segments, each configured as . 72-bit is the minimum width of a possible

configuration. The Incoming Connectivity table has 64 entries, 38-bit each (32-bit neigh-

bor’s IP address plus 6-bit interface ID). In TCAM device, it occupies one segment (the

minimum unit for configuration), which is configured as . The Outgoing Con-

nectivity table ( ) and User/Control Mapping table ( ) are config-

ured in a similar way. The State table has entries, each entry has 128 bits (a five-tuple

as we explained before). In TCAM device, it occupies one segment, which is configured

as . Fig. 54 shows the configurations of the five data tables in the TCAM

device, and the corresponding look-up commands and SSR registers.

32K

4K 72-bit×

4K 72-bit×

64 38-bit× 64 32-bit×

1K

2K 144-bit×

SEGSEL[2:0]

000 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0

Segment Select Registers

SegSel0

001 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0SegSel1

010 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0SegSel2

011 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0SegSel3

100 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0SegSel4

32K x 72-bit

4K x 72-bit

4K x 72-bit

4K x 72-bit

LTIN[1:0]

00 (72-bit)

00 (72-bit)

00 (72-bit)

00 (72-bit)

01 (144-bit)2K x 144-bit

Routingtable

IncomingConn tableOutgoing

Conn tableUser/Ctrl

Mapping table

State table

Figure 54. Configurations of the five data tables in the TCAM device.

80

Typically a TCAM device has an associated SRAM device to store extra data. The

matched address can be used to fetch the extra data from the SRAM device. The Routing

table, User/Control Mapping table, and State table follow this way. However, the Incom-

ing and Outgoing Connectivity tables work in a different way. As we explained in

Chapter 4, the return value of Incoming/Outgoing Connectivity table is the interface ID. In

our system, the switch fabric has 64 input/output interfaces, the interface ID ranges from 0

to 63, or from “000000” to “111111” in binary. We can arrange the locations of the Incom-

ing and Outgoing Connectivity tables in the TCAM device in such a way that the least 6

bits of the matched address represent the interface ID, as shown in Fig. 55. In this way we

can save the clock cycles to access the SRAM device.

The final mapping of the data tables into the TCAM and SRAM devices is shown in

Fig. 56.

Figure 55. Use matched address as interface ID.

0x0000

0xFFFF

0x7FFF0x8000

0x803F

0x8001

0x803E

IncomingConn table

0x9000

0x903F

OutgoingConn table

000000000001

111110111111

Last 6-bitInterface 0Interface 1

Interface 62Interface 63

Interface

TCAM Address

81

6.2.4 User-plane device - VSC9182 [51]

In order to demonstrate a complete signaling system, we included a “thin” user plane,

with the sole purpose of interfacing the hardware signaling accelerator to an actual switch-

ing device. The Vitesse 64x64 switch fabric, VSC9182, is the selected switching device.

Since it is impractical to design a complete 64x64 switch with the user-plane I/O inter-

faces on a PCI board and since our focus is on the control-plane not the user-plane, we

omitted these I/O interface devices. For testing purpose, we do however include SMB

connectors on 2 parts of the VSC9182 switch fabric to show that user data flows through

72-bit

64K

32-bitNetwork Prefix &

Subnet Mask

Incoming Connectivity table

Outgoing Connectivity table

32K

64

64

2K

Src_IP_Addr(32-bit)+LSP_ID(16-bit)+Dest_IP_Addr(32-bit)+Tunnel_ID(16-bit)+Ext_Tunnel_ID(32-bit)1K

128-bit

38-bit

38-bit

32-bitUser/Control Mapping table

64-bit

64

0x0000

0x0BCD128.238.34.117 128.143.74.193

32-bit

TCAM (Index) SRAM (Return value)

Next_IP_Addr_User

State information

0xFFFF

0x0000

0xFFFF

(228-bit)

Routing table

Unused

UnusedUnused

State table

Next_IP_Addr_User Next_IP_Addr_Ctrl

Pre_IP_Addr+Interface_ID

Next_IP_Addr+Interface_ID

Figure 56. Organization of the data tables.

82

after a circuit is established and stops when the circuit is released.

Fig. 57 shows the block diagram of the VSC9182. The core is an interconnection

matrix and program memory. It has three interfaces, the input backplane interface consist-

ing of 64 Low Voltage Differential Signaling (LVDS) input channels, the output backplane

interface consisting of 64 LVDS output channels, and the CPU interface used to configure

the registers and program the switch fabric. Programming the switch fabric (space switch-

ing and time-slot interchanging) is performed via the CPU interface by presenting the out-

put channel and time-slot address on the address bus A[10:0], and the input channel and

time-slot address on the data bus D[9:0], and filling the program memory. There are 64

input channels and 64 output channels, each with 12 time-slots. Fig. 58 shows how a value

on data bus or address bus represents a certain time-slot on a certain channel.

Fig. 59 shows the timing diagram to program the switch fabric. A rising edge on WRB

transfers the program information to the program memory. The programming does not

Figure 57. Block diagram of VSC9182.

InterconnectionMatrix

and Storage

ProgrammingInterface

InputBackplaneInterface

OutputBackplaneInterface

ClockSynthesis

PLL

RXD[63..0]+/- TXD[63..0]+/-

SYSCLKP/N

Clock

Data Data

Ctrl Ctrl

D[9

..0]

A[1

0..0

]C

SBW

RB

RD

BCO

NFI

G

Figure 58. Address bus and data bus.

A[10:0] D[9:0]1b 6b 4b 6b 4bRegister Group

(0=switch)

Destination Channel Number (0-63)(Channel number is output pin number)

Destination Timeslot Number (0-11)(First transmitted byte is timeslot 0)

Source Timeslot Number (0-11)(First received byte is timeslot 0)

Source Channel Number (0-63)(Channel number is input pin number)

83

take effect until the CONFIG pin is asserted. The switch fabric may be programmed con-

tinuously but the new map will not take effect until the user asserts the CONFIG signal.

The current address map may be read back by presenting the output channel and time-slot

address on A[10:0], releasing the D[9:0] signals and lowering RDB, causing the contents

of the register to appear at D[9:0].

The signals used to program the switch fabric are summarized in Table 18.

6.3 Clock distribution networks

Clock distribution is a critical issue in high-speed PCB board design. In our system,

there are eight high-speed devices requesting different clock frequencies, ranging from

33MHz to 125MHz. Some devices need highly synchronous timing. Besides, the PCI bus

has a 33MHz system clock supplied by the host computer. Table 19 lists the devices and

Table 18. Signals used to program the switch fabric.

Signal IO DescriptionD[9:0] B Data busA[10:0] I Address busCSB I Chip selectWRB I Write enableRDB I Read enableCONFIG I Config enable

Write Address Read Address

Write Data Valid Read Data

ADDR[10:0]

DATA[9:0]

CSB

WRB

RDB

CONFIG

Figure 59. Timing diagram to program the switch fabric.

84

the related clock signals.

Fig. 60 shows the clock distribution scheme, which involves four clock sources.

Among these, the PCI clock (PCLK) is supplied by the host computer through the

PCI local bus. There are three on-board oscillators providing local timing. The

oscillator provides a system clock to VSC9182 (SYSCLKP/N), from which the on-chip

PLL synthesizes the internal clock with a multiplication ratio of (to be accu-

rate, the calculation is as follows: ). Correspondingly, all

input/output lines (LVDS serial signals) run at (STS-12). A second oscillator

provides transmit clock (TCLK) to the L8104 GbE MAC device. This

input clock is used by the 8B/10B PCS section and generates the transmit output

Table 19. Devices and related clock signals.

Device Function Signal Frequency Sync. toHDMP-1636A SerDes REFCLK 125MHz L8104

L8104 GbE MACTCLK 125MHz HDMP-1636ASCLK 33MHz N/AREGCLK 33MHz N/A

IDT75P52100 TCAM CLK2X 100MHz IDT71V2556SIDT71V2556S SRAM CLK 50MHz IDT75P52100IDT72V36100 FIFO WCLK 50MHz N/A

RCLK 50MHz N/AVSC9182 Switch fabric SYSCLKP/N 78MHz N/AXC2V3000 FPGA GCLK 100MHz N/A

Figure 60. Clock distribution scheme.

100MHz

78MHz

DCM÷2

100M

Hz

50MHz

DCM 50MHz

33MHzL8104SCLK

IDT72V36100

XC2V3000

IDT71V2556A

IDT71V2556A

IDT75P52100VSC9182

GCLKDCM

DCM

50MHz

CLK2X

WCLK/RCLK

T CLK

CLK

CLK

SYSCLKP/N

H1636AREFCLK 125MHz T BC

From PCI bus33MHz

PCLK

REGCLK

125MHz

Del

aylin

e

Delay line

TCAM

SRAM

SRAM

FPGA

Switch fabricFIFO

SerDes

GbE MAC

33MHz

78MHz

622MHz 8

77.76MHz 8 622.08MHz=×

622.08Mbps

125MHz 125MHz

125MHz

85

clock (TBC), which is used to output data on the 10-bit PHY interface. The TBC

is also sent to HDMP-1636A as a reference clock (REFCLK). This clock is internally mul-

tiplied by 10 to generate the high speed clock ( ) necessary for clocking the high

speed serial outputs ( or ).

A third oscillator provides base timing for all other devices. Since the FPGA device,

XC2V3000, is located at the center of the board and all other devices are distributed

around it, the FPGA is a good hub for clock distribution. Besides, the XC2V3000 has

twelve Digital Clock Manager (DCM) modules, which can provide precise clock de-skew,

flexible frequency synthesis, and high-resolution phase shifting. In our clock distribution

scheme, the oscillator is connected to the global clock pin (GCLK) of the FPGA.

Inside the FPGA, by using the GCLK as a reference clock, four DCMs synthe-

size the clock signals for other devices, as shown in Fig. 60.

The FIFO device, IDT72V36100, is driven by a clock signal of . Although the

read clock (RCLK) and the write clock (WCLK) can be independent, both clocks are con-

nected to the same clock source in our design. The TCAM device, IDT75P52100, is

driven by a clock signal of (CLK2X). However, the internal working frequency

is , generated with an internal divider. The CLK2X provides a

Dual Data Rate (DDR) I/O interface. The timing of TCAM device and the attached SRAM

device must be highly synchronous. For this reason, the attached SRAM device,

IDT71V2556A, is driven by the clock that originates from the same DCM as the TCAM

device but is located beyond a divider inside the FPGA. We have a second SRAM

device for outgoing message buffering. A separate DCM is used to supply a clock

to this SRAM device. As we discussed before, the GbE MAC device, L8104, has a clock

125MHz

1250MHz

1250MBd/sec 1Gbps

100MHz

100MHz

50MHz

100MHz

50MHz 1 2⁄ 100MHz

1 2⁄

50MHz

86

input from an oscillator. In addition to transmit clock (TCLK), it has two more

clock inputs, system clock (SCLK) and register-interface clock (REGCLK). These two can

be independent, but in our design, they are connected to the same clock source.

6.4 Power distribution scheme

Power distribution is another important issue for high-speed PCB board design.

Although PCI specification [52] claims that “the maximum power allowed for any PCI

board is 25 watts,” in a typical personal computer system, the maximum power a PCI

board can get from the main board is less than 10 watts. Table 20 lists main devices and

the corresponding power dissipations. Obviously, the power supply from the PCI bus can-

not meet the total power requirement.

We plan to use external power supplies to solve this problem. This is common in prac-

tice. For example, high-end video cards, such as Radeon® 9800XT, due to the power-con-

suming Graphics Processing Unit (GPU), often require external power supply. On our

prototype board, we plan to put two extra power supply adapters, one supplies and

to on-board devices, the other supplies .

Table 20. Power dissipations of main devices.

Device Function Power dissipation

HDMP-1636A SerDesHFBR-53D5 Optical txcvr.L8104 GbE MACIDT75P52100 TCAMIDT71V2556S SRAMIDT72V36100 FIFO 1

1. Data not available

VSC9182 Switch fabric

XC2V3000 FPGA

125MHz

33MHz

900mW

630mW

2.2W

10W

2W

16W<

2.5W≈

1.8V

2.5V 3.3V

87

Chapter 7 Design and Analysis of Signaling Networks

While much attention has been paid to the definition and implementation of signaling

protocols, little attention has been given to the transport mechanisms for sending signaling

messages between circuit switches. The RSVP-TE specification [15] allows for the signal-

ing messages to be carried directly in IP packets, but it does not constrain the links/paths

used to transport these IP packets in any way.

Most vendors have allowed for two options for the signaling “links”: (i) in-band sig-

naling channels, which are realized as bandwidth set aside within user-plane interfaces

between two switches, e.g., the Data Communication Channel (DCC) within each SONET

signal, and (ii) out-of-band signaling channels, e.g., paths through the Internet accessed

via an Ethernet interface from the control (call) processor of the switch. Service providers

are given the flexibility to choose between these two options. In this chapter, we compare

these two options and provide some quantitative guidance to service providers on the

question of which option to use.

Our comparison mainly focuses on the call setup delay, which includes (i) signaling

message processing delays, (ii) round-trip propagation delay, and (iii) signaling message

emission delays. In Chapter 5, we introduced an FPGA-based hardware signaling acceler-

ator, which reduces the signaling message processing overhead to the order of microsec-

onds. The second component, round-trip propagation delay, cannot be reduced due to

speed-of-light constraints. The goal of this work is to reduce the third component, emis-

sion delays, without of course compromising utilization.

88

7.1 Introduction to in-band signaling and out-of-band signaling

Fig. 61 illustrates the two options for signaling transport: in-band signaling and out-

of-band signaling. In the in-band signaling option, the signaling traffic shares the same

channel as the data traffic. For example, the DCC channel in SONET signal can be used to

transport signaling messages, as shown in Fig. 62a. However, the bandwidth of the DCC

channel is limited. For each separate OC-1 signal, the Section DCC (SDCC) has a band-

width of , and the Line DCC (LDCC) has a bandwidth of .

There are two alternatives that can overcome the bandwidth limitation of the DCC

channel. In the first case, a switch interface is partitioned into signaling bandwidth and

user-plane bandwidth. For example, one or more OC-1s can be set aside within each user-

plane interface to exclusively carry signaling messages (e.g. Fig. 62b shows an OC-1 sig-

naling channel within an OC-12 interface). In the second case, one or more OC-1s of one

user-plane interface between two switches, which are connected by interfaces, are set

Figure 61. In-band and out-of-band signaling options.

SW1

SW2 SW3

SW6

SW4 SW5

SW2 SW3

SW4 SW5

SW1 SW6

R1

R2 R3

R6

R4 R5

IP network

(a) In-Band Signaling. (b) Out-of-Band Signaling.

SignalingUser

192Kbps 576Kbps

(a) In-band signaling case 1. (b) In-band signaling case 2. (c) In-band signaling case 3.

OC-1

{OC-12DCC

Figure 62. Three cases of in-band signaling.

N

89

aside for signaling, as shown in Fig. 62c. The disadvantage of these two alternatives is that

there is revenue loss due to the allocation of user-plane bandwidth to signaling.

Fig. 61b illustrates the out-of-band signaling option. The key difference is that the path

taken by the signaling channel can be different from the path taken by the user-plane inter-

faces that it supports. For example, the signaling channel could pass through packet

switches while the user-plane interfaces are direct (logical or physical) between two

switches. In other words, the signaling channel is separate from and independent of the

data channel. A classical example of out-of-band signaling is the SS7 network, which is

used to carry signaling messages between DS0-based telephone circuit switches. SS7 is a

connectionless packet-switched technology.

Unlike in telephone networks, when it was economically feasible to create a dedicated

connectionless packet-switched network just for signaling traffic, in today’s environment,

with the ubiquity of the Internet, it is more likely that GMPLS-enabled SONET/SDH/

WDM circuit switches will leverage the Internet for signaling message transport. A ser-

vice provider could simply connect the control processors of their GMPLS-enabled circuit

switches to the Internet and expect it to route signaling messages as needed. A potential

drawback is latency. We create models for such out-of-band signaling channels routed

through the Internet, and analyze these models to predict performance.

7.2 Delay models

In this section, we set up and analyze models for the two signaling transport options,

in-band signaling and out-of-band signaling. Our goal is to compute the total delay

incurred in processing and sending a signaling message from one switch to the next suc-

cessfully. This consists of the following components: (i) queueing delay plus service time

90

at the signaling processor, (ii) queueing delay plus transmission time on the signaling

channel, (iii) total delay to successfully send the message on the signaling channel, which

includes retransmissions in case of errors. We list our assumptions and notation, and then

describe our queueing model.

7.2.1 Assumptions

We assume that the call arrival process is Poisson, which has been shown to hold true

for FTP applications in [53]. Each call involves multiple signaling messages. For example,

in RSVP-TE, each call involves Path messages, Resv messages, and PathTear/ResvTear

messages. We assume the arrival of signaling messages is also Poisson. Although the mes-

sages involved in a call are related, we argue that since the duration of a call is random, the

arrival times of PathTear/ResvTear messages are independent of those of Path and Resv

messages. Besides, after a switch sends out a Path message, the Path message may go

through a random number of downstream switches before reaching the destination. The

time to the reception of a Resv message following the transmission of a Path message is

thus randomized.

The service times at signaling processors and the transmission times at signaling chan-

nels are both assumed to be deterministic. The processing of different RSVP-TE signaling

messages takes different delays, with Path message consuming the most time. However in

Chapter 5, we introduced a pipelined architecture for hardware-acceleration of RSVP-TE

signaling message processing. In order to achieve full pipelining and the highest through-

put, we purposely inserted dummy cycles when processing Resv, PathTear, and ResvTear

messages so that the processing of all four messages takes the same number of clock

cycles. Even without such an implementation, the difference in processing times is not sig-

91

nificant for different message types. As for the sizes of signaling messages, the Path and

Resv messages are roughly equivalent in size, while the PathTear and ResvTear are

roughly equivalent in size. The approximate size is a few bytes. Therefore we assume

that signaling processing times and signaling transmission times are deterministic.

We assume that all messages fit in one packet, since the maximum transmission unit

size of most existing networks is larger than the size of an RSVP-TE message.

7.2.2 Notation

Our notation is shown in Table 21. In the rest of the paper, the superscripts, OOB and

IB, may be used on some of these parameters as appropriate for out-of-band and in-band,

respectively.

7.2.3 Queueing model

There are two servers related to signaling at a GMPLS-enabled circuit switch: (i) sig-

naling protocol processor, and (ii) signaling channel transmitter. We assume that the

Table 21. Notation.

Symbol Meaning

Aggregate signaling message arrival rate

Service rate of the signaling processor

Service rate of the signaling channel transmitter

Number of GMPLS-enabled neighbors to a switch (also the number of in-band signaling channels)

Probability of sending an outgoing signaling message to the in-band signaling channel;

A random variable denoting the response time at the signaling protocol processor (waiting time plus service time)

A random variable denoting the response time at the signaling channel transmitter (waiting time plus service time)

One-way network delay in sending a signaling message from one switch to the next

Initial time-out value of the retransmission timer at the sender

Probability of packet loss

100

λ

µproc

µtx

n

qj jth

j 1 … n, ,=

Tproc

Ttx

Tn

To

p

92

switch has a single signaling protocol processor irrespective of whether the signaling link

solution is in-band or out-of-band. We first describe our queueing model for the signaling

protocol processor and the signaling channel transmitter without considerations of mes-

sage loss and subsequent re-transmissions. We then improve this model by adding in the

possibility of message loss and re-transmissions.

7.2.3.1 Model without retransmissions

Fig. 63 illustrates our queueing models of these two servers for both in-band and out-

of-band solutions. In the in-band solution, there are signaling channels given our

assumption of GMPLS-enabled neighbors to the switch (see Table 21). In the out-of-

band solution, we assume that there is only one out-of-band signaling channel.

Given that the models shown in Fig. 63 are networks of queues, we consulted the liter-

ature [54] on modeling such queueing networks. Most of the solutions for queueing net-

works, such as Burke’s theorem and Jackson’s theorem, are for a network of M/M/1

queues while in our case, the service times of both queues are deterministic.

Even though there is no analytical derivation for a network of queues with determinis-

tic service times (to our knowledge), it turns out that if we take into account practical con-

siderations, this network of queues reduces to a single M/D/1 queue. Our reasoning is as

follows. The service rate of the signaling processor, , is determined by the switch

λOOBtxµ

(b) Out-of-Band Signaling.

λprocµ

1IBtxµ

2IBtxµ

nIBtxµ

(a) In-Band Signaling, where

procµ

1

1n

ii

q=

=∑

1q

2q

nq

.

Figure 63. Queueing models of the signaling protocol processor and the sig-naling channel transmitters.

n

n

µproc

93

vendor, while the service rate of the signaling channel, , either in-band or out-of-band,

is determined by the service provider. We expect the switch vendor to select so that

the user-plane interfaces and the signaling protocol processor are equally utilized. Once

this decision is made, and a switch is manufactured, is fixed. A service provider

who purchases this switch then has only one degree of freedom to choose an appropriate

bandwidth level for the signaling channels, i.e., the service provider can select .

The service provider should limit , because choosing a larger than

is an unnecessary waste of bandwidth. Therefore we attempt to solve the network of

queues model in Fig. 63 for only two cases, , and . We further

limit the second case to only . Reasoning that a service provider may choose

for costs reasons, we further argue that in this case it is more likely that

because cost differentials for minor drops in transmission rates are often not

significant (e.g., compare the cost of an circuit vs. a circuit). Clearly the

service provider should recognize that in choosing this option instead of the

option, the network is being planned for a lower call arrival rate than the maximum rate

for which the switch is designed.

For these two cases, and , the queueing network model of

Fig. 63 is reduced to a single M/D/1 queue. This is because under our assumption that the

service time at the signaling protocol processor is deterministic, the departure rate is no

more than . If , the second server (signaling channel transmitter) can

keep up with the arriving signaling messages. In other words, the second queue is always

empty, and the delay at the second queue is solely the service time (or emission time). On

the other hand, if , since for the system to be stable, , which

µtx

µproc

µproc

µtx

µtx µproc≤ µtx

µproc

µtx µproc= µtx µproc<

µtx µproc«

µtx µproc<

µtx µproc«

800Mbps 1Gbps

µtx µproc=

µtx µproc= µtx µproc«

µproc µtx µproc=

µtx µproc« λ µtx≤ λ µproc«

94

implies that the first queueing system is lightly loaded. We therefore assume that when a

signaling message arrives, it is served immediately and the first queue is always empty.

For these two cases, and , the queueing network model of

Fig. 63 is reduced to a single queue. The mean response time for a signaling message at a

switch, , is given by:

, if (1)

, if (2)

where is the mean response time at the signaling protocol processor queue, and

is the mean response time at the signaling channel transmitter queue (see Fig. 63).

The above reasoning applies directly to the out-of-band case (Fig. 63b) since in this

case there is only one signaling channel. In the in-band solution, a similar reasoning holds

if the individual signaling channel rates are selected according to the probabilities (see

Fig. 63 and Table 21), i.e., , . Using the argument stated

before, we assume that the service provider will either choose all the in-band signaling

channel rates such that , or choose rates that are much lower,

. Under these conditions, our earlier reasoning of treating the queueing net-

work as a single queue holds.

7.2.3.2 Model including re-transmissions

In this model, we include the third component of the delay, which is , the one-way

network delay (see Table 21). This delay is impacted by losses and re-transmissions. Mes-

sage loss is possible irrespective of whether the signaling channel is in-band or out-of-

band. The reasons for message loss in the in-band case are due to bit- and burst-errors on


E Tsw[ ]

E Tsw[ ] E Tproc[ ] 1µtx

-------+= µtx µproc=

E Tsw[ ] E Ttx[ ] 1µproc

------------+= µtx µproc«

E Tproc[ ]

E Ttx[ ]

qj

µtxIBj

qjµproc= j 1 2 … n, , ,=

µtxIBj

qjµproc=

µtxIBj

qjµproc«

Tn

95

the links, and receive-buffer overflows due to flow control problem. Bit- and burst-link

errors arise due to noise and interference on the physical media. Even though optical fiber,

the physical medium of these high-speed circuit-switched networks, is fairly reliable, link

errors are unavoidable. Receive-buffer overflows will occur if the signaling protocol pro-

cessor at the sending switch is faster than that at the receiving side. Since different switch

vendors could use different implementation techniques for the signaling protocol proces-

sor, this flow control problem may arise. In the out-of-band solution, an additional cause

of message loss is buffer overflow at any of the routers on the path between the sending

and receiving switches.

Since RSVP-TE is specified to use raw IP, and these protocols do not offer a reliable

service, RFC 2961 [40] proposed an exponential back-off retransmission algorithm to pro-

vide reliability.

Assuming that retransmissions will, in typical implementations, join the signaling

channel transmitter queue, and that it is valid to rewrite (1) to be the same as (2) (in other

words for both cases, , and , we use (2)), we re-draw the model

for the second queue (the signaling channel transmitter) as shown in Fig. 64. With proba-

bility , a message is lost, which means a time-out occurs, and the message is re-transmit-

ted. To model exponential doubling, we show the time-out value as a function of the

initial time-out value .

f(T0)

Transmitter (1-p)

p

λ

Figure 64. Signaling channel transmitter modelincluding re-transmissions.


p

f

To

96

The mean response time at the transmitter queue, is a sum of the mean waiting

time, , and the service delay (which is a constant, , given our assumptions).

The mean waiting time, , can be computed assuming that the message arrival rate at

the transmitter queue is given by:

(3)

but for small , this can be approximated to .

Since in RSVP-TE, the signaling protocol processor is also involved in handling re-

transmissions, the same load appears at both the signaling protocol processor and the sig-

naling channel transmitter. Thus, re-writing (1) to be the same as (2) is not affected

because of retransmissions.

Using , , , , and (see Table 21), and following the exponential back-

off re-transmission algorithm, we determine the mean total time to process and send the

outgoing signaling message successfully to the next-hop switch as:

(4)

The explanation for the above equation is as follows. The first two terms, and

, correspond to the processing delay and the time to transmit a message successfully

one-way from the sending switch to the receiving switch. The next term corresponds to the

mean response time (mean waiting time plus mean service time) waiting in the signaling

channel transmitter queue. This delay is incurred once if the first transmission is success-

ful (probability ), twice if the first transmission is unsuccessful but the second

transmission is a success and so on. The last term in (4) corresponds to the time-out value,

E Ttx[ ]

E W[ ] 1 µtx⁄

E W[ ]

λ λp λp2 λp3 …+ + + + λ1 p–( )

----------------=

p λ

µproc Tn Ttx T0 p

E T[ ] 1 µproc⁄ Tn E Ttx[ ] 1 p–( )2p 1 p–( ) 3p2 1 p–( )( ) …

++ +

() T0 p 1 p–( )

3p2 1 p–( ) 7p3 1 p–( ) …+

+ +(

)

+ ++

=

1 µproc⁄

Tn

1 p–( )

97

which is being doubled for every re-transmission.

(5)

Assuming (counting round-trip network delay and other minor delays), (5)

can be re-written as:

(6)

Using the mean response time of an M/D/1 queue for , we have:

(7)

where . The first term, , is common to both in-band signaling and

out-of-band signaling. Therefore, , , and determine which is better, in-band sig-

naling or out-of-band signaling under different sets of conditions. We consider different

numerical values for these input parameters and compare the two signaling transport solu-

tions.

E T[ ] 1µproc

------------ Tn1

1 p–------------E Ttx[ ] p

1 2p–---------------T0+ + +=

T0 3Tn=

E T[ ] 1µproc

------------ 11 p–------------E Ttx[ ] 1 p+

1 2p–---------------Tn+ +=

E Ttx[ ]

E T[ ] 1µproc

------------ 11 p–------------

1 ρ2---–

1 ρ–------------ 1

µtx

-------⋅ 1 p+1 2p–---------------Tn+⋅+=

ρ λ µ⁄ tx= 1 µproc⁄

p µtx Tn

98

7.3 Numerical results and implications

7.3.1 Input parameter values

With hardware signaling accelerator introduced in Chapter 5, RSVP-TE signaling

messages can be processed within several microseconds. Accordingly we assume is

for a hardware signaling accelerator and for a software signaling

processor.

For hardware-accelerated signaling, when , is ; when

, we assume , or . For software signaling, even

when , is or (assuming bytes or bits per

Table 22. Input parameter values.

Symbol Value

Varied from to

/sec, hardware signaling/sec, software signaling

/sec, hardware signaling,

/sec, hardware signaling,

/sec, software signaling

and . For the purpose of comparison, the aggregate bandwidths of in-band

and out-of-band signaling are the same, therefore

Probability of sending an outgoing signaling message to the in-band signaling channel;

in metro area in wide area

and in metro area and in wide area

and

1% and 5%

λ 0.01µtx 0.95µtx

µproc 200 000,

200

µtx µtx 200 000,= µtx µproc=

µtx 40 000,= µtx µproc«

µtx 200=

n 5 10

µtxIBj µtx

OOBn⁄=

qj jth

j 1 … n, ,=

TnIBj 0.2ms

25ms

TnOOB 0.5ms 1ms

35ms 45ms

ToIBj 3Tn

IBj

ToOOB 3Tn

OOB

pIBj 10 8– 10 4–

pOOB

µproc

200 000/sec, 200/sec

µtx µproc= µtx 200 000/sec,

µtx µproc« µtx15---µproc= 40 000/sec,

µtx µproc= µtx 200/sec 200Kbps 125 1000

99

message). Considering the fact that even a single SDCC channel has a bandwidth of

, assigning a value less than is impractical. Therefore we do not

consider for software signaling.

In order to choose reasonable values for and , we refer to the National Laboratory

for Applied Network Research (NLANR) [55] for Round-Trip Time (RTT) and packet

loss rate measurements. We randomly chose North Carolina University (NCSU) as our

starting point. For the data collected on June 14, 2004, we noticed that the RTT between

NCSU and the North Carolina Networking Initiative (NCNI), is . We used the corre-

sponding one-way delay, , for our metro setting. The distance between these two

locations is 20.6 miles. The propagation delay is around . We also notice that in a

similar geographical distance ( miles to be accurate), the RTT between NCSU and

University of North Carolina, Chapel Hill (UNC) is , or for one-way. The differ-

ent RTTs may result from different network conditions. Based on the data obtained from

NLANR, we choose and for in the metro area. We choose for

, since this delay is mainly propagation delay. In the wide area, for example from

NCSU to locations in California, the RTTs range from (California Institute of Tech-

nology, CIT) to (San Diego State University, SDSU). We choose and

for , and for (network delay). The reason we allow for both a

metro-area and a wide-area setting is that some service providers may connect two signal-

ing-enabled switches located across a wide-area with logical links making them neighbors

from a signaling point of view. In the telephone network, for example, many carriers do

interconnect signaling-enabled switches in California directly via logical links (for the

user-plane interfaces) with switches in New York.

192Kbps µtx 200/sec

µtx µproc«

Tn p

1ms

0.5ms

0.16ms

22.6

2ms 1ms

0.5ms 1ms TnOOB 0.2ms

TnIBj

75ms

89.99ms 35ms

45ms TnOOB 25ms Tn

IBj

100

We choose and for . As we explained before, even for in-band signaling,

bit- and burst-link errors, buffer overflows may cause packet loss. For , we choose

and .

In the following subsections, we will show the delays under different scenarios. In

each case, we vary the aggregate signaling message arrival rate, from to ,

to show the delays under different loads.

7.3.2 Hardware-accelerated signaling engine, and

1% 5% pOOB

pIBj

10 4– 10 8–

0.01µtx 0.95µtx

µtx µproc=

Figure 65. In-band/out-of-band signaling with hardwaresignaling, metro area, .µtx µproc=

101

From Fig. 65, we can observe that in the metro area, when hardware-accelerated sig-

naling is used and , in-band signaling always outperforms out-of-band sig-

naling. Referring to (7), the first two terms are relatively small (in the order of ) when

compared to , which is in the order of . Thus, the total delay is mainly determined by

. Since is less than , the in-band solution does better job.

According to (7), when is small, its impact on the total delay is negligible. For in-

band signaling, the values of we use are and (see Table 22). In Fig. 65 and

all following figures, we cannot observe notable difference between in-band signaling

cases for and .

Fig. 66 shows the total delay in the wide area. Here, the total delay is even more dom-

inated by , which is significantly higher than the first two terms of (7), and hence in-

band signaling outperforms out-of-band signaling.

Figure 66. In-band/out-of-band signaling with hardwaresignaling, wide area, .µtx µproc=

µtx µproc=

µs

Tn ms

Tn TnIBj

TnOOB

p

pIBj 10 8– 10 4–

pIBj 10 4–= p

IBj 10 8–=

Tn

102

7.3.3 Hardware-accelerated signaling engine, and

When , for hardware-accelerated signaling, the first term of (7) is still neg-

ligible, but the queueing delay at the transmitter (i.e., the second term of (7)) becomes

comparable to the third term in the metro area setting, where is in the hundreds of .

When is high, the factor starts increasing to the point where the sec-

ond term dominates the total mean delay. In this case the out-of-band signaling outper-

forms in-band signaling because is five times higher in the out-of-band case than in

the in-band case (see Table 22). In the wide area, still dominates and hence the lower

rate of the in-band channel is compensated by the increased of the out-of-band solu-

tion.

µtx µproc«

Figure 67. In-band/out-of-band signaling with hardwaresignaling, metro area, .µtx µproc«

µtx µproc«

Tn µs

ρ 1 ρ 2⁄–( ) 1 ρ–( )⁄

µtx

Tn

Tn

103

7.3.4 Software signaling processor

From Fig. 69, we can observe that out-of-band signaling outperforms in-band signal-

ing in the metro area when software signaling processors are used. This is because the

message processing delay and emission delays (first two terms of (7)) dominate the total

delay.

In the wide area (Fig. 70), the choice between in-band and out-of-band signaling really

depends on the choices of parameters. When and , out-of-band sig-

naling demonstrates the best performance. But we do note that the RTT measurements

obtained from the NLANR site were made primarily on Internet2, which is lightly loaded.

On the Internet, RTTs are likely to be significantly higher. In this case, even with software

signaling processors, the network delay may dominate, making in-band signaling a better

choice.

Figure 68. In-band/out-of-band signaling with hardwaresignaling, wide area, .µtx µproc«

Tn 35ms= p 1%=

104

7.4 Some comments

We compared in-band and out-of-band signaling transport options under assumptions

of the switches having hardware signaling accelerator engines or software signaling pro-

cessors. With hardware signaling accelerator, if the bandwidth allocated for the in-band

signaling channels is in par with the speeded-up processing rates, then the main source of

delay in sending a signaling message is the network delay. Using an out-of-band path

through the public Internet could make this component significant and thus obliterate the

Figure 69. In-band/out-of-band signaling with software sig-naling, metro area.

Figure 70. In-band/out-of-band signaling with softwaresignaling, wide area.

105

gains in call setup delays made with hardware signaling accelerators. In this case, fast in-

band signaling channels are required. In the parameters we considered, this required set-

ting aside one OC1 for each neighbor switch to which there could potentially be a large

number of interfaces, each at a multi-OC1 rate. On the other hand, with software signaling

processors, we found that network delays matter much less, and hence the cheaper out-of-

band public Internet option is sufficient in the metro area. But in the wide area, in-band

signaling is typically a better option.

106

Chapter 8 Applications of Hardware-Accelerated

Signaling

The impact of hardware-accelerated signaling is quite far-reaching. It will fundamen-

tally change our current view of connection-oriented networks. With the call processing

delay per switch decreasing to the order of microseconds, and correspondingly the call

handling rate increasing to the order of calls/sec, connections can be established and

released much faster, switches can handle more requests and maintain more simultaneous

connections. This allows for establishing end-to-end connections even for applications

with short holding times, achieving better QoS while maintaining good resource utiliza-

tion. Furthermore, a new network architecture is suggested, in which multiple distributed

small switches replace a large switch. This new architecture leads to cheaper networks

with better utilization. Another example is the application of hardware-accelerated signal-

ing in network survivability. In this chapter, we briefly discuss these two applications. We

try to shed fresh light on the possible applications of hardware-accelerated signaling. This

is only a starting point. We expect more work in the future on applications that will lever-

age the advantage of connection-oriented networks given this significant technology

advance of hardware-accelerated signaling.

8.1 Utilization analysis of circuit-switched networks for file transfers

8.1.1 Problem statement

In [5], we propose to use end-to-end circuits for large file transfers. In [57], we further

investigated the possibility to apply this idea in Storage Area Networks (SANs). A file

transfer has no intrinsic burstiness, which makes it suitable for circuit-switched networks.

105

107

Once an end-to-end circuit is established, a large file can be transferred at the constant rate

of the circuit. For example, transferring a MPEG4 video file over a circuit

takes around , while it takes only on a circuit.

The higher the call setup delay/transfer delay ratio, the lower the utilization. If pure

software-based signaling is used, the call setup delay per switch is in the order of a few

milliseconds. End-to-end call setup delays along paths traversing multiple switches can

add up to tens to hundreds of milliseconds. This signaling overhead is negligible com-

pared to the transfer time for a file on a circuit but becomes sig-

nificant when the circuit rate is . Furthermore, when the file size is small, the

signaling overhead leads to poor circuit utilization. For example, transferring a MP3

music file over the same connection requires only around , which is signifi-

cant when the call setup delay is in the order of tens to hundreds of milliseconds. There-

fore from a circuit utilization perspective establishing an end-to-end circuit is justifiable

only for those applications with long holding times. For the file transfer application, we

define a crossover file size. Only those files larger than the crossover file size should be

transferred on end-to-end circuits.

Our discussion above focused on only one aspect of utilization, i.e., per-circuit utiliza-

tion. It addresses the issue of whether the circuit, once reserved, is actively in use (trans-

ferring a file) or whether it is simply being held, waiting for the call setup phase to

complete. There is another aspect of utilization, which we call aggregate network utiliza-

tion. It determines whether or not a circuit is occupied. Circuit-switched networks are typ-

ically operated in call-blocking mode. The Erlang-B formula is widely used to model call

blocking probabilities in such networks. With fixed blocking probabilities, the Erlang-B

1GB 1Gbps

8 seconds 800ms 10Gbps

8 seconds 1 GB 1Gbps

10Gbps

5MB

1Gbps 40ms

108

formula shows that the higher the offered traffic load, the higher the utilization, see Fig. 71

(also observe that the higher the blocking probability, the higher the utilization). When

applied to file transfers, the offered load is determined by the crossover file size. From

Fig. 72 we can see that the larger the crossover file size, the lower the effective offered

load.

In order to improve the per-circuit utilization, we need to limit the use of circuits to

large files. This results in low traffic load on the circuits, which, in turn, leads to low

aggregate network utilization. Hardware-accelerated signaling is key to addressing this

impasse. It effectively reduces the call setup overhead, making the per-circuit utilization

acceptable even for a small crossover file size, which, in turn, leads to a higher load and

hence higher aggregate network utilization. In this section, we analyze the impact of hard-

Figure 71. Aggregate network utilization vs. offered traffic load.

1

(1 ) , where

is fixed and may vary

( , )

ba

b

b

Pm

P m

m Erlang P

ρµ

ρ−

− ×=

=

Figure 72. Crossover file size vs. percentage of the offered load.

1000 ,06.1ondistributi Pareto

== kα

109

ware-accelerated signaling on the total network utilization, i.e., the product of both per-

circuit utilization and aggregate network utilization. Table 23 lists the notation we use in

the analysis.

8.1.2 Utilization analysis and the numerical results

Circuit-switched networks, such as telephone networks, typically work in call-block-

ing mode. If resources are available, the call is accepted; if resources are not available, the

call is rejected or blocked. Equation (8) shows the Erlang-B formula used to model the call

blocking probability.

(8)

Table 23. Notation.

Symbol MeaningAggregate network utilization

Per-circuit utilization

Total network utilizationCall blocking probability

Offered traffic loadNumber of circuitsAverage file sizeCircuit rate

Fractional offered loadCross-over file sizeAverage call setup delay

Round-trip propagation delay

µa

µc

µ

Pb

ρ

m

E X[ ]

rc

ρ′

χ

E Tsetup[ ]

Tprop

Offered load ρ (1 )bP ρ− ×

bPρ

×

1

2

mBlocked

Figure 73. A model for a circuit-switched nodeoperated in call-blocking mode.

Pbρm

m!⁄

ρk

k!-----

k 0=

m

∑

---------------=

110

In Fig. 73, the offered load is , and the number of available circuits is

(Erlang is the unit for traffic load, ). Since the blocking

probability is (given in (8)), the effective load is . This effective load is

equally partitioned among circuits. Hence the aggregate network utilization, , is

given by:

(9)

Since we restrict the use of circuits to only files larger than some crossover file size,

, we can compute the fractional offered load and the average file size if

we know the distribution of file sizes. Reference [56] suggests a Pareto distribution for

file sizes. Using this distribution, we compute the fractional offered load as:

(10)

where , the shape parameter, is and , the scale parameter, is 1000 bytes as com-

puted in [56], is the total offered load.

As mentioned earlier, the total network utilization has two components: aggregate net-

work utilization and per-circuit utilization . Per-circuit utilization, , is given by:

(11)

Combining the two components of utilization, we obtain total utilization as:

(12)

We plot the total utilization in Fig. 74 for different , and . As the crossover

file size increases, the plots show utilization increasing because of the second factor,

i.e., the per-circuit utilization increases. However, the drop in the offered load and the cor-

ρ Erlang m

1 Erlang 1 call/sec 1 sec/call×=

Pb 1 Pb–( ) ρ×

m µa

µa

1 Pb–( ) ρ×m

----------------------------=

χ ρ′ E X X χ≥[ ]

ρ′

ρ′ ρE X[ ]-----------P X χ≥( )E X X χ≥[ ] ρ α 1–( )

αk--------------------- k

χ---⎝ ⎠⎛ ⎞

α αχα 1–------------ ρ k

χ---⎝ ⎠⎛ ⎞

α 1–= = =

α 1.06 k

ρ

µa µc µc

µc

E Ttransfer[ ]E Tsetup[ ] E Ttransfer[ ]+----------------------------------------------------------, where E Ttransfer[ ] E X[ ]

rc-----------= =

µ

µ µa µc×1 Pb–( ) ρ′×

m------------------------------=

E X X χ≥[ ] rc⁄E Tsetup[ ] E X X χ≥[ ] rc⁄+----------------------------------------------------------------×=

Pb ρ Tprop

χ

111

responding drop in the aggregate utilization slow the increase of the total utilization, mak-

ing it stable at some value below 1 or even dropping slightly. The overall effect is that the

total utilization reaches a maximum value at some crossover file size level. For a blocking

probability of , in the metro area ( ), this size is less than , while

in the wide area ( ), this size is around . When the blocking proba-

bility is reduced to , both sizes are even smaller.

For the purpose of comparison, we plot the total utilization for pure software-based

signaling in Fig. 75. Because of the higher call setup overhead, the per-circuit utilization is

poor for small crossover file sizes. The total utilization reaches its maximum value at a

larger crossover file size. For a blocking probability of , the size is , for a block-

ing probability of , the size is around . A larger crossover file size means less

files can be transferred through end-to-end circuits (refer to Fig. 72).

0.3 Tprop 0.1ms= 0.5MB

Tprop 50ms= 1.5MB

0.01

Figure 74. Crossover file size vs. total utilization (hardware signaling).

0.3 7MB

0.01 3MB

Figure 75. Crossover file size vs. total utilization (software signaling).

112

From Fig. 74 and 75, we can observe that when the load ( ) increases, the utilization

( ) also increases. For any given utilization, we can calculate the load needed to achieve

this utilization. In Table 24, we list the load needed to achieve a given utilization under

different scenarios. For blocking probability , we choose ; while for

, we choose . We must sacrifice the utilization to achieve low block-

ing probability.

The implication of these numbers is that a large number of interfaces are needed at the

circuit switches leading to the shared interface to create the aggregate load needed to

achieve the desired utilization, as shown in Fig. 76. Since most of the switch costs lie in

the interfaces, it effectively means that hardware-accelerated signaling enables the cre-

ation of cheaper switches. This allows for the creation of connection-oriented networks

with an architecture more like existing data networks rather than today’s telephone net-

works. A typical central office in a telephone network has thousands of lines (interfaces)

while an Ethernet switch in an enterprise department has 24 or 48 interfaces. The lower

per-switch call setup delay enabled by hardware signaling implies that a much larger num-

Table 24. Number of circuits needed for a given utilization.

Hardware-accelerated signaling Pure software signaling

ρ

µ

Pb 0.3= µ 90%=

pb 0.01= µ 80%=

Pb 0.3=

µ 90%=

Tprop 0.1ms= ρ 35= Pb 0.3=

µ 90%=

Tprop 0.1ms= ρ 220=

Tprop 50ms= ρ 42= Tprop 50ms= ρ 480=

Pb 0.01=

µ 80%=

Tprop 0.1ms= ρ 90= Pb 0.01=

µ 80%=

Tprop 0.1ms= ρ 200=

Tprop 50ms= ρ 102= Tprop 50ms= ρ 250=

(a) With hardware-accelerated signaling. (b) With pure software signaling.

Figure 76. Pure software signaling needs more client-sideinterfaces to create the aggregate load toachieve the desired utilization

113

ber of hops are acceptable in such connection-oriented networks. For example, a wide area

traceroute through the Internet shows 15-20 routers while telephone calls seldom pass

through more than 4-5 switches (domestic calls) at which signaling processing is done.

This architecture of smaller size (fewer interfaces) but more hops lends itself to a more

distributed cheaper (from a per-enterprise perspective) connection-oriented network. Thus

the implications of hardware signaling are quite profound.

8.2 Hardware-accelerated signaling and network survivability

Network survivability is defined as the ability of a network to recover from failure(s).

Network survivability is a crucial requirement in high-speed networks. A single optical

fiber cut or core switch failure may cause a significant data loss.

There are two mechanisms to implement network survivability: protection and restora-

tion. Protection requires a pre-assigned secondary path for each connection. When failure

occurs, the protected connection can be switched to the secondary path quickly. Since the

secondary path is pre-assigned, no signaling procedures are involved in the switchover,

which makes it fast. However, protection typically requires resource redundancy.

For example, SONET/SDH Automatic Protection Switch (APS) requires a primary path

and a secondary path for each connection. User traffic is transmitted on both paths simul-

taneously (1+1 protection). Once a failure occurs, the connection can be switched from the

primary path to the secondary path in less than . On the other hand, restoration does

not require a secondary path to be pre-assigned. Instead, the nodes try to re-establish the

affected connection along a secondary path after a failure occurs. The route of the second-

ary path can be pre-calculated or dynamically calculated after the failure. Signaling proce-

dures are involved in restoration. The advantage of restoration is that it does not require a

100%

50ms

114

pre-assigned secondary path, and hence achieves higher resource utilization; however

while the restoration time cannot be guaranteed. Both protection and restoration can be

implemented at the link level or at the path level. In link-level mechanism, the nodes adja-

cent to the failure initiate the recovery procedures, making the recovery procedures local.

In path-level mechanism, the nodes adjacent to the failure notify the end points of this path

upon detecting a failure. Recovery procedures are then initiated by the end hosts and take

place in a wider area.

Network survivability is one of the main applications of GMPLS signaling. Accord-

ingly, the IETF CCAMP working group is active in standardizing/extending GMPLS sig-

naling protocols for network survivability. Several IETF drafts have be proposed, [59]-

[62], defining the terminology, providing a functional description, defining necessary sig-

naling extensions, and analyzing performance. Meanwhile, many technical papers have

been published [63]-[67] on this topic.

In this section, we try to analyze the implications of hardware-accelerated signaling on

network survivability. We expect hardware-accelerated signaling to enable fast restora-

tion, achieving both high utilization and low recovery delay. The notation we use is shown

in Table 25. We only consider single link failures. Although SONET/SDH APS imple-

mentations achieve a recovery delay of , recent research suggests that the

recovery delay is not necessary. In [68], the author concludes that recovery delay

would not jeopardize services (data, video, voice). We also set our recovery delay target to

be .

Table 25. Notation.

Symbol MeaningThe number of nodesThe number of links

50ms 50ms

200ms

200ms

n

l

115

Fig. 77 shows a sample network with a single link failure. The network parameters are

as follows, , , , , and .

8.2.1 Utilization and delay analysis for path restoration

We assume that the traffic between all pairs of nodes is the same (normalized to 1).

There are bi-directional connections in a network with nodes. Since the average

number of hops on the primary path of a connection is , i.e., on average each connection

involves links, if we assume all connections are equally distributed among links, the

average load on a single link, (by definition, it is also the number of connections car-

The average degreeThe average number of hops on the primary path

The average number of hops on the secondary path

The average number of hops on the link restoration ring path

The total load on the primary paths

The recovery load on the secondary paths

The total load on the primary paths after recovery

The average load on a link

The average recovery load on a link

The average propagation delay along a hop

The average delay to send a signaling message from one node to the next

The total recovery delay

Table 25. Notation.

Symbol Meaningd

Hp

Hs

Hr

Lp

Ls

Lp ′

Lp

Ls

Tprop

Tsig

Ttotal

n 13= l 22= Hp 2.4= Hs 3.5= d 3.4=SEA

SFR

LOS

SND

DAL

HOU

LAS ATL

MIA

CHIBOS

NYC

WAS

Figure 77. A sample network with a single link failure.

n2⎝ ⎠⎛ ⎞ n

Hp

Hp l

Lp

116

ried on this link), can be represented as:

(13)

Once a link failure occurs, all affected connections need to be re-established. In path

restoration, the nodes adjacent to the failure notify the end points of all connections

affected by the failure. Once failure notification is received by the end points, two proce-

dures will be incurred, releasing the resources along the primary path and establishing a

connection along the secondary path. These two procedures take place for all affected con-

nections. We assume that a secondary route has been pre-calculated for each connection.

The total recovered load on these secondary paths is given by:

(14)

where is the average number of hops along a secondary path, which is typically larger

than that along the primary path, .

At the same time, the resources allocated for the affected connections on the primary

paths are released, while the resources for other connections are not affected. Accordingly,

the total remaining load on the primary paths is given by:

(15)

Since path restoration takes place globally, we further assume that the restored traffic

is equally distributed among the remaining links (this assumption may be optimis-

tic, it is possible that not all links are involved in the restoration of a single link failure).

The average load on a link, after the restoration, is given by:

(16)

Lp

n2⎝ ⎠⎛ ⎞ Hp×

l--------------------= n n 1–( )Hp

2l-------------------------=

Ls Lp Hs×n n 1–( )HpHs

2l-------------------------------==

Hs

Hp

Lp′ Lp Lp H× p– Lp l Hp–( )×n n 1–( )Hp l Hp–( )

2l---------------------------------------------= = =

l 1–( )

LsLp′ Ls+l 1–

------------------ n n 1–( )Hp l Hs Hp–+( )2l l 1–( )

--------------------------------------------------------= =

117

in (13) is the traffic load on a link under normal condition; in (16) is the traffic

load on a link after restoration, which includes the unaffected normal traffic and restored

traffic. Accordingly, the link utilization can be represented as:

(17)

is mainly determined by the connectivity of a network; the higher the con-

nectivity, the smaller the difference between and . The connectivity of a network is

related to the average degree of the network ( ).

Using the network in Fig. 77 as an example, where , , and ,

we have . In other words, reserving of the total bandwidth on each link for

path restoration can enable the network in Fig. 77 to recover from a single link failure.

In order to analyze the recovery delay, it is necessary to review the sequence of opera-

tions that need to be performed after a link failure. Typically there are three operations:

• Failure detection

• Failure notification

• Recovery

The nodes adjacent to a failure can detect the failure by detecting Loss of Light (LOL),

Loss of Signal (LOS), Signal Degrade (SD), etc. The nodes then generate Notify messages

directed to the end points responsible for restoring the connections. The Notify message

has been added to RSVP-TE for GMPLS to provide a mechanism for failure notification.

Unlike other RSVP messages, the Notify message can be “targeted” to a node other than

the immediate upstream or downstream neighbor. All intermediate nodes just forward the

message, unmodified, towards the target. No processing delays are incurred for Notify

Lp Ls

µp

Lp

Ls----- l 1–

l Hs Hp–( )+------------------------------= =

Hs Hp–( )

Hs Hp

d 2ln

------=

l 22= Hs 3.5= Hp 2.4=

µp 91%= 9%

118

messages at intermediate nodes. Therefore it is reasonable to assume that failure detection

takes no time, and the delay for failure notification is the time to propagate the Notify mes-

sage from the node adjacent to the failure to the end point responsible for restoring the

connection. The average failure notification delay is .

Once the end hosts get the Notify messages, they initiate connection setup on the sec-

ondary paths. The total delay to re-establish a connection includes the processing and

transportation of Path messages in the downstream direction, and the processing and

transportation of Resv messages in the upstream direction. The average recovery delay can

therefore be calculated as . Following the analysis in Chapter 7, we can cal-

culate , the delay to transmit an RSVP-TE signaling message from one node to the

next by using (7). We assume that hardware-accelerated signaling and in-band signaling

transportation option are used. Considering the network in Fig. 77, we further assume that

the RSVP-TE messages are transmitted in the wide area and . From Fig. 68,

we can estimate that . The total delay is given by:

(18)

In Fig. 77, , , and , we have , which

is less than . As a contrast, if pure software signaling implementation is used,

according to the measurements provided by a leading switch vendor1, .

Using the same parameters in Fig. 77, we have , or around 2 seconds.

1 We cannot expose the vendor due to the Non-Disclosure Agreement.

Tprop Hp×2

-----------------------⎝ ⎠⎛ ⎞

Tsig 2Hs×( )

Tsig

µtx µproc«

Tsig 25ms=

TtotalTprop Hp×

2----------------------- Tsig 2Hs×+=

Tprop 5ms= Hp 2.4= Hs 3.5= Ttotal 181ms=

200ms

Tsig 300ms>

Ttotal 2106ms=

119

Chapter 9 Conclusions and Future Work

Envisioning possible new applications of connection-oriented networks, we realize the

importance of improving the call handling capacities of switches. We propose to improve

the call handling capacities by implementing signaling protocols in hardware. As a proof-

of-idea we design and implement OCSP, a signaling protocol specifically designed for

SONET switches, on WILDFORCE FPGA evaluation board. We achieve a call handling

rate of calls/sec and a per-switch call processing delay of . In order to

answer the challenge to implement a widely available signaling protocol in hardware, we

implement a subset of RSVP-TE on a Xilinx Virtex-II FPGA device. This subset is elabo-

rately defined so that most time-critical functions can be handled by hardware. The timing

simulation shows that our implementation, with a fully pipelined architecture, can achieve

a call handling rate of calls/sec and a per-switch call processing delay of .

Based on our implementations, we conclude that hardware-accelerated signaling is

feasible, and demonstrates a 3 orders of magnitude improvement in performance. Hard-

ware-accelerated signaling will fundamentally change our current view of connection-ori-

ented networks. For example, it is widely believed that an end-to-end connection is

justifiable from a utilization perspective only for those applications with large call holding

times. We argue that with hardware-accelerated signaling, we can set up a connection even

for applications with short duration. We use file transfer as an example and define cross-

over file size. If a file is larger than this crossover file size, an end-to-end connection will

be tried. Besides, we propose a new network architecture with smaller-sized switches but

more hops. This architecture lends itself to a more distributed cheaper (from a per-enter-

prise perspective) connection-oriented network. Hardware-accelerated signaling can also

150 000, 6.6µs

250 000, 7.2µs

120

be used for network survivability. Dynamically established secondary path (path restora-

tion scheme) can provide the required reliability with less resource redundancy compared

to protection. And thanks to hardware-accelerated signaling, the recovery delay can

be restricted in .

To complete the analysis of end-to-end setup delays, we compare different signaling

transport options (in-band signaling vs. out-of-band signaling). Based on the numerical

results, we conclude that for hardware-accelerated signaling, in-band signaling option is a

better choice; for pure software signaling, in the metro area out-of-band signaling suffices,

while in the wide area, in-band signaling again is a better choice.

Our current work on the applications and implications of hardware-accelerated signal-

ing is only a starting point. The impact of hardware-accelerated signaling is quite far-

reaching. We will propose new network architectures and applications that can fully

exploit the benefit of hardware-accelerated signaling.

1 1+

200ms

121

References

[1] http://www.internetworldstats.com

[2] http://www.whois.sc/internet-statistics/country-ip-counts.html

[3] http://abilene.internet2.edu/

[4] http://www.internet2.edu/

[5] M. Veeraraghavan, X. Zheng, H. Lee, M. Gardner, and W. Feng, “CHEETAH: Circuit-

switched High-speed End-to-End Transport ArcHitecture,” Proc. of Opticomm2003,

October 15-18, Dallas, TX.

[6] T. Russell, Signaling System #7, 2nd edition, McGraw-Hill, New York, 1998.

[7] The ATM Forum Technical Committee, “User Network Interface Specification v3.1,”

af-uni-0010.002, Sept. 1994.

[8] The ATM Forum Technical Committee, “Private Network-Network Specification In-

terface v1.0F (PNNI 1.0),” af-pnni-0055.000, March 1996.

[9] L. Andersson, P. Doolan, N. Feldman, A. Fredette, B. Thomas, “LDP Specification,”

IETF RFC 3036, Jan. 2001.

[10] B. Jamoussi, L. Andersson, R. Callon, R. Dantu, L. Wu, P. Doolan, T. Worster, N.

Feldman, A. Fredette, M. Girish, E. Gray, J. Heinanen, T.Kilty, A. Malis, “Constraint-

Based LSP Setup using LDP,” IETF RFC 3212, Jan. 2002.

[11] D. Awduche, L. Berger, D. Gan, T. Li, et al., “RSVP-TE: Extensions to RSVP for LSP

Tunnels,” IETF RFC 3209, Dec. 2001.

[12] E. Mannie (editor), “Generalized Multi-Protocol Label Switching (GMPLS) Architec-

ture,” IETF RFC 3945, Oct. 2004.

[13] L. Berger, “Generalized Multi-Protocol Label Switching (GMPLS) Signaling Func-

122

tional Description,” IETF RFC 3471, Jan. 2003.

[14] P. Ashwood-Smith, L. Berger, “Generalized Multi-Protocol Label Switching (GM-

PLS) Signaling Constraint-based Routed Label Distribution Protocol (CR-LDP) Ex-

tensions,” IEEE RFC 3472, Jan. 2003.

[15] L. Berger, “Generalized Multi-Protocol Label Switching (GMPLS) Signaling Re-

source ReserVation Protocol-Traffic Engineering (RSVP-TE) Extensions,” IEEE RFC

3473, Jan. 2003.

[16] R. Braden, L. Zhang, S. Berson, S. Herzog, S. Jamin, “Resource ReSerVation Protocol

(RSVP) - Version 1 Functional Specification,” IETF RFC 2205, Sept. 1997.

[17] K. Kompella (editor), Y. Rekhter, “OSPF Extensions in Support of Generalized Multi-

Protocol Label Switching,” IETF Internet Draft, draft-ietf-ccamp-ospf-gmpls-exten-

sions-12.txt, Oct. 2003.

[18] J. Lang, “Link Management Protocol (LMP),” IETF Internet Draft, draft-ietf-ccamp-

lmp-10.txt, Oct. 2003.

[19] S. K. Long, R. R. Pillai, J. Biswas, T. C. Khong, “Call Performance Studies on the

ATM Forum UNI Signalling,” http://www.krdl.org.sg/Research/Publications/Papers/

pillai_uni_perf.pdf.

[20] M. Veeraraghavan and R. Karri, “Towards enabling a 2-3 orders of magnitude im-

provement in call handling capacities of switches,” NSF proposal 0087487, 2001.

[21] T. Hellstern, “Fast SVC Setup,” ATM Forum Contribution 97-0380, 1997.

[22] L. Roberts, “Inclusion of ABR in Fast Signaling,” ATM Forum Contribution 97-0796,

1997.

[23] P. Pan and H. Schulzrinne, “YESSIR: A Simple Reservation Mechanism for the Inter-

123

net,” IBM Research Report, RC 20967, Sept. 2, 1997.

[24] M. Veeraraghavan, G. L. Choudhury, and M. Kshirsagar, “Implementation and Analy-

sis of Parallel Connection Control (PCC),” Proc. of IEEE Infocom'97, Kobe, Japan,

Apr. 7-11, 1997.

[25] P. Boyer and D. P. Tranchier, “A Reservation Principle with Applications to the ATM

Traffic Control,” Computer Networks and ISDN Systems Journal 24, 1992, pp. 321-

334.

[26] I. Baldine, G. N. Rouskas, H. G. Perros, D. Stevenson, “JumpStart: A Just-in-Time Sig-

naling Architecture for WDM Burst-Switched Networks,” IEEE Communication

Magazine, pages 82-89, Feb. 2002.

[27] D. Gluskin and I. Cidon, “A Hardware Signaling Paradigm for Fine-Grained Resource

Reservation”, Technion, CCIT Publication # 433, June 2003.

[28] M. Benz, “An Architecture and Prototype Implementation for TCP/IP Hardware Sup-

port,” Proc. of TERENA Networking Conference 2001, May 14-17, 2001, Antalya,

Turkey.

[29] 10 Gigabit Ethernet Alliance white paper, “Introduction to TCP/IP Offload Engine

(TOE),” Apr. 2002, http://www.10gea.org/SP0502IntroToTOE_F.pdf.

[30] P. Molinero-Fernandez, N. Mckeown, “TCP Switching: Exposing Circuits to IP,”

IEEE Micro, vol. 22, issue 1, Jan./Feb. 2002, pp. 82-89.

[31] H.Wang, M.Veeraraghavan and R. Karri, “A Hardware Implementation of a Signaling

Protocol, in Proc. of Opticomm 2002,” July 29-Aug. 2, 2002, Boston, MA.

[32] H. Wang, M. Veeraraghavan, T. Li, “Specification of a subset of GMPLS RSVP-TE

for a hardware-accelerated implementation,” http://eeweb1.poly.edu/networks/papers/

124

rsvp-subset.pdf, Nov. 2002.

[33] H. Wang, M. Veeraraghavan, R. Karri, “Detailed description of GMPLS RSVP-TE

signaling procedures for hardware implementation,” http://eeweb1.poly.edu/networks/

papers/procedures /procedures.pdf, Dec. 2002.

[34] H. Wang, R. Karri, M. Veeraraghavan, and T. Li, “Hardware-Accelerated Implemen-

tation of the RSVP-TE Signaling Protocol,” Proc. of IEEE ICC2004, June 20-24,

2004, Paris, France.

[35] M. Waldvogel, G. Varghese, J. Turner, B. Plattner, “Scalable High Speed IP Routing

Lookups,” Proc. ACM SIGCOMM’97, Sept. 14-18, 1997, Palais des Festivals, France.

[36] P. Gupta, S. Lin, N. McKeown, “Routing Lookups in Hardware at Memory Access

Speeds,” IEEE INFOCOM 1998, pp. 1240-1247, March 29-April 2, 1998, San Fran-

cisco, USA.

[37] R. Braden, D. Clark, S. Shenker, “Integrated Services in the Internet Architecture: an

Overview,” IEEE RFC 1633, June 1994.

[38] Eric Mannie, D. Papadimitriou, “Generalized Multi-Protocol Label Switching (GM-

PLS) Extensions for SONET and SDH Control,” IETF RFC 3946, Oct. 2004.

[39] D. Papadimitrious, “Generalized MPLS (GMPLS) Signaling Extensions for G.709

Optical Transport Networks Control,” IETF Internet Draft, draft-ietf-ccamp-gmpls-

g709-08.txt, Sept. 2004.

[40] L. Berger, D. Gan, et al., “RSVP Refresh Overhead Reduction Extensions,” IETF RFC

2961, April 2001.

[41] Xilinx datasheet, “Virtex-II 1.5V Field-Programmable Gate Arrays, Advanced Prod-

uct Specification,” Oct. 2001, http://direct.xilinx.com/bvdocs/publications/ds031.pdf.

125

[42] Xilinx user guide, “Virtex-II Platform FPGA User Guide,” Nov. 2002, http://direct.xil-

inx.com/bvdocs/userguides/ug002.pdf.

[43] Xilinx application note, “Two Flows for Partial Reconfiguration: Module Based or

Difference Based,” Nov. 2003. http://www.xilinx.com/bvdocs/appnotes/xapp290.pdf.

[44] http://www.sycamorenet.com

[45] Agilent Technologies datasheet, “1 x 9 Fiber Optic Transceivers for Gigabit Ethernet,

HFBR-53D5 Family,” Nov. 1999.

[46] Agilent Technologies datasheet, “Gigabit Ethernet and Fibre Channel SerDes ICs,

HDMP-1636A Transceiver,” April 2002.

[47] LSI Logic datasheet, “8101/8104 Gigabit Ethernet Controller,” Nov. 2001.

[48] IDT datasheet, “3.3 Volt High-Desity SuperSync™ II 36-bit FIFO - 64K x 36,

IDT72V36100,” Feb. 2003.

[49] IDT datasheet, “Network Search Engine - 64K x 72 Entries, IDT75P52100,” Feb.

2002.

[50] IDT datasheet, “3.3V Synchronous ZBT™ SRAMs, 2.5V I/O, Burst Counter, Pipe-

lined Outputs - 128K x 36, IDT71V2556S,” May 2002.

[51] Vitesse datasheet, “64x64 STS-12/STM-4 TSI Switch Fabric, VSC9182,” Dec. 2001.

[52] PCI Speical Interest Group, “PCI Local Bus Specification, Product Version, Revision

2.1,” June 1, 1995.

[53] M. E. Crovella and A. Bestavros, “Self-similarity in World Wide Web Traffic Evi-

dence and Possible Causes,” Proc. of the SPIE International Conference on Perfor-

mance and Control of Network Systems, Nov., 1997.

[54] D. Gross, C. M. Harris, Fundamentals of Queueing Theory, 3rd edition, Wiley-Inter-

126

science, 1998.

[55] http://www.nlanr.net.

[56] V. Paxon and S. Floyd, “Wide-Area Traffic: The Failure of Poisson Modeling,” IEEE/

ACM Transactions on Networking, Vol.3, June 1995.

[57] H.Wang, M.Veeraraghavan and R. Karri, “A Dynamic Circuits Based Wide-Area SAN

Solution,” the 1st International Workshop on Optical Networking Technologies for

Global SAN solutions, Oct. 13, 2002, Dallas, TX.

[58] Alberto Leon-Garcia, Indra Widjaja, Communication Networks - Fundamental Con-

cepts and Key Architectures, McGraw-Hill Higher Education.

[59] E. Mannie, D. Papadimitriou, “Recovery (Protection and Restoration) Terminology

for Generalized Multi-Protocol Label Switching (GMPLS),” IETF Internet Draft,

draft-ietf-ccamp-gmpls-recovery-terminology-05.txt, Oct. 2004.

[60] D. Papadimitriou, Eric Mannie, “Analysis of Generalized Multi-Protocol Label

Switching (GMPLS)-based Recovery Mechanisms (including Protection and Restora-

tion),” IETF Internet Draft, draft-ietf-ccamp-gmpls-recovery-analysis-04.txt, Oct.

2004.

[61] J. P. Lang, B. Rajagopalan, “Generalized Multi-Protocol Label Switching (GMPLS)

Recovery Functional Specification,” IETF Internet Draft, draft-ietf-ccamp-gmpls-re-

covery-functional-03.txt, Oct. 2004.

[62] J.P. Lang, Y. Rekhter, D. Papadimitriou, “RSVP-TE Extensions in support of End-to-

End GMPLS-based Recovery,” IETF Internet Draft, draft-ietf-ccamp-gmpls-recovery-

e2e-signaling-01.txt, May 2004.

[63] J. Wang, L. Sahasrabuddhe, B. Mukherjee, “Path vs. subpath vs. link restoration for

127

fault management in IP-over-WDM networks: performance comparisons using GM-

PLS control signaling,” IEEE Communication Magazine, Vol.40, issue 11, pages:80-

87, Nov. 2002.

[64] A. Banerjee, J. Drake, J. Lang, “Generalized Multiprotocol Label Switching: An

Overview of Signaling Enhancements and Recovery Techniques,” IEEE Communica-

tion Magazine, Jan. 2001.

[65] M. Alicherry, H. Nagesh, C. Phadke, and V. Poosala, “FASTeR: sub-50-ms shared res-

toration in mesh networks,” Journal of Optical Networking, Vol.2, No.11, Nov. 2003.

[66] G. V. Kaigala, W. D. Grover, “On the efficacy of GMPLS auto-reprovisioning as a

mesh-network restoration mechanism,” Proceedings of IEEE GLOBECOM 2003,

Vol.7, Dec. 2003.

[67] G. Li, J. Yates, R. Doverspike, and D. Wang, “Experiments in fast restoration using

GMPLS in optical/electronic mesh networks,” Proceedings of OFC 2001, Vol.4, pp.

PD34-(1-3), March 2001.

[68] Jeff Schallenberg, “Is 50 ms Restoration Necessary?” IEEE Bandwidth Management

Workshop IX, June 25-28, 2001, Montebello, Canada.

dissertation - University of Virginia School of ...mv/PhDthesis/haobo/dissertation.pdf ·...

Documents

Transcript of dissertation - University of Virginia School of ...mv/PhDthesis/haobo/dissertation.pdf ·...