Traffic adaptive WDM networks: a study of reconfiguration ...

23
JOURNAL OF LIGHTWAVE TECHNOLOGY,VOL. 19, NO. 4, APRIL 2001 433 Traffic Adaptive WDM Networks: A Study of Reconfiguration Issues Ilia Baldine and George N. Rouskas, Member, IEEE Abstract—This paper studies the issues arising in the reconfig- uration phase of broadcast optical networks. Although the ability to dynamically optimize the network under changing traffic con- ditions has been recognized as one of the key features of multi- wavelength optical networks, this is the first in-depth study of the tradeoffs involved in carrying out the reconfiguration process. We develop and compare reconfiguration policies to determine when to reconfigure the network, and we present an approach to carry out the network transition by describing a class of strategies that determine how to retune the optical transceivers. We identify the degree of load balancing and the number of retunings as two im- portant, albeit conflicting, objectives in the design of reconfigura- tion policies, naturally leading to a formulation of the problem as a Markovian decision process. Consequently, we develop a system- atic and flexible framework in which to view and contrast recon- figuration policies. We show how an appropriate selection of re- ward and cost functions can be used to achieve the desired bal- ance among various performance criteria of interest. We conduct a comprehensive evaluation of reconfiguration policies and retuning strategies and demonstrate the benefits of reconfiguration through both analytical and simulation results. The result of our work is a set of practical techniques for managing the network transition phase that can be directly applied to networks of large size. Al- though our work is in the context of broadcast networks, the re- sults can be applied to any wavelength-division multiplexing net- work where it is necessary to multiplex traffic from a large user population into a number of wavelengths. Index Terms—Broadcast optical networks, Markov decision process, reconfiguration policies, wavelength-division multi- plexing (WDM). I. INTRODUCTION O NE OF the key features of multiwavelength optical net- works is rearrangeability [4], i.e., the ability to dynam- ically optimize the network for changing traffic patterns, or to cope with failure of network equipment. This ability arises as a consequence of the independence between the logical connec- tivity and the underlying physical infrastructure of fiber glass. By employing tunable optical devices, the assignment of trans- mitting or receiving wavelengths to the various network nodes may be updated on the fly, allowing the network to closely track changing traffic conditions. While the rearrangeability property makes it possible to de- sign traffic-adaptive self-healing networks, the reconfiguration phase will interfere with existing traffic and disrupt network per- formance, causing a degradation of the quality of service per- Manuscript received March 15, 2000; revised January 3, 2001. This work was supported by the National Science Foundation under Grant NCR-9701113. The authors are with the Department of Computer Science, North Carolina State University, Raleigh, NC 27695-7534 USA (e-mail: [email protected]). Publisher Item Identifier S 0733-8724(01)02752-9. ceived by the users. The issues that arise in reconfiguring a light- wave network by retuning a set of slowly tunable transmitters or receivers have been studied in the context of multihop networks in [5], [6], and [9]. In [5], the problem of obtaining a virtual topology that minimizes the maximum link flow, given a set of traffic demands, was studied, while in [6], algorithms were developed for minimizing the number of branch-exchange op- erations required to take the network from an initial to a target virtual topology, once the traffic pattern changes. The objective of [9], on the other hand, was to obtain near-optimal policies to dynamically determine when and how to reconfigure the net- work. In this paper, we study the reconfiguration issues arising in single-hop lightwave networks, an architecture suitable for local- and metropolitan-area networks (LANs and MANs) [7]. The single-hop architecture employs wavelength-division multiplexing (WDM) to provide connectivity among the network nodes. The various channels are dynamically shared by the attached nodes, and the logical connections change on a packet-by-packet basis creating all-optical paths between sources and destinations. Thus single-hop networks require the use of rapidly tunable optical lasers and/or filters that can switch between channels at high speeds. When tunability only at one end, say, at the transmitters, is employed, each fixed receiver is permanently assigned to one of the wavelengths used for packet transmissions. In a typical near-term WDM environment, the number of chan- nels supported within the optical medium is expected to be smaller than the number of attached nodes. As a result, each channel will have to be shared by multiple receivers, and the problem of assigning receive wavelengths arises. Intuitively, a wavelength assignment (WLA) must be somehow based on the prevailing traffic conditions. More specifically, the stability condition, derived in [11], for the HiPeR- reservation protocol for broadcast WDM networks suggests that in determining an appropriate WLA, the objective should be to balance the offered load across all channels, such that each channel carries an approximately equal portion of the overall traffic. 1 But with fixed receivers, any WLA is permanent and cannot be updated in response to changes in the traffic pattern. Alternatively, one can use slowly tunable, rather than fixed, receivers. We will say that an optical laser or filter is rapidly tunable if its tuning latency (i.e., the time it takes to switch from one wavelength to another) is on the order of a packet trans- 1 This result is intuitive and has been accepted by the research community for years. However, by deriving a stability condition for HiPeR- [11], our study was the first to quantify the effect of load balancing on the performance of broadcast optical networks. 0733–8724/01$10.00 ©2001 IEEE

Transcript of Traffic adaptive WDM networks: a study of reconfiguration ...

Page 1: Traffic adaptive WDM networks: a study of reconfiguration ...

JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001 433

Traffic Adaptive WDM Networks: A Study ofReconfiguration Issues

Ilia Baldine and George N. Rouskas, Member, IEEE

Abstract—This paper studies the issues arising in the reconfig-uration phase of broadcast optical networks. Although the abilityto dynamically optimize the network under changing traffic con-ditions has been recognized as one of the key features of multi-wavelength optical networks, this is the first in-depth study of thetradeoffs involved in carrying out the reconfiguration process. Wedevelop and compare reconfiguration policies to determinewhento reconfigure the network, and we present an approach to carryout the network transition by describing a class of strategies thatdetermine how to retune the optical transceivers. We identify thedegree of load balancing and the number of retunings as two im-portant, albeit conflicting, objectives in the design of reconfigura-tion policies, naturally leading to a formulation of the problem asa Markovian decision process. Consequently, we develop a system-atic and flexible framework in which to view and contrast recon-figuration policies. We show how an appropriate selection of re-ward and cost functions can be used to achieve the desired bal-ance among various performance criteria of interest. We conduct acomprehensive evaluation of reconfiguration policies and retuningstrategies and demonstrate the benefits of reconfiguration throughboth analytical and simulation results. The result of our work isa set of practical techniques for managing the network transitionphase that can be directly applied to networks of large size. Al-though our work is in the context of broadcast networks, the re-sults can be applied to any wavelength-division multiplexing net-work where it is necessary to multiplex traffic from a large userpopulation into a number of wavelengths.

Index Terms—Broadcast optical networks, Markov decisionprocess, reconfiguration policies, wavelength-division multi-plexing (WDM).

I. INTRODUCTION

ONE OF the key features of multiwavelength optical net-works isrearrangeability[4], i.e., the ability to dynam-

ically optimize the network for changing traffic patterns, or tocope with failure of network equipment. This ability arises as aconsequence of the independence between the logical connec-tivity and the underlying physical infrastructure of fiber glass.By employing tunable optical devices, the assignment of trans-mitting or receiving wavelengths to the various network nodesmay be updated on the fly, allowing the network to closely trackchanging traffic conditions.

While the rearrangeability property makes it possible to de-sign traffic-adaptive self-healing networks, the reconfigurationphase will interfere with existing traffic and disrupt network per-formance, causing a degradation of the quality of service per-

Manuscript received March 15, 2000; revised January 3, 2001. This work wassupported by the National Science Foundation under Grant NCR-9701113.

The authors are with the Department of Computer Science, NorthCarolina State University, Raleigh, NC 27695-7534 USA (e-mail:[email protected]).

Publisher Item Identifier S 0733-8724(01)02752-9.

ceived by the users. The issues that arise in reconfiguring a light-wave network by retuning a set of slowly tunable transmitters orreceivers have been studied in the context of multihop networksin [5], [6], and [9]. In [5], the problem of obtaining a virtualtopology that minimizes the maximum link flow, given a setof traffic demands, was studied, while in [6], algorithms weredeveloped for minimizing the number of branch-exchange op-erations required to take the network from an initial to a targetvirtual topology, once the traffic pattern changes. The objectiveof [9], on the other hand, was to obtain near-optimal policiesto dynamically determine when and how to reconfigure the net-work.

In this paper, we study the reconfiguration issues arisingin single-hop lightwave networks, an architecture suitable forlocal- and metropolitan-area networks (LANs and MANs)[7]. The single-hop architecture employs wavelength-divisionmultiplexing (WDM) to provide connectivity among thenetwork nodes. The various channels are dynamically sharedby the attached nodes, and the logical connections change ona packet-by-packet basis creating all-optical paths betweensources and destinations. Thus single-hop networks requirethe use of rapidly tunable optical lasers and/or filters that canswitch between channels at high speeds.

When tunability only at one end, say, at the transmitters,is employed, each fixed receiver is permanently assigned toone of the wavelengths used for packet transmissions. In atypical near-term WDM environment, the number of chan-nels supported within the optical medium is expected to besmaller than the number of attached nodes. As a result, eachchannel will have to be shared by multiple receivers, and theproblem of assigning receive wavelengths arises. Intuitively,a wavelength assignment (WLA) must be somehow based onthe prevailing traffic conditions. More specifically, the stabilitycondition, derived in [11], for the HiPeR-reservation protocolfor broadcast WDM networks suggests that in determiningan appropriate WLA, the objective should be to balance theoffered load across all channels, such that each channel carriesan approximately equal portion of the overall traffic.1 But withfixed receivers, any WLA is permanent and cannot be updatedin response to changes in the traffic pattern.

Alternatively, one can useslowly tunable,rather than fixed,receivers. We will say that an optical laser or filter is rapidlytunable if its tuning latency (i.e., the time it takes to switch fromone wavelength to another) is on the order of a packet trans-

1This result is intuitive and has been accepted by the research community foryears. However, by deriving a stability condition for HiPeR-` [11], our study wasthe first to quantify the effect of load balancing on the performance of broadcastoptical networks.

0733–8724/01$10.00 ©2001 IEEE

Page 2: Traffic adaptive WDM networks: a study of reconfiguration ...

434 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

mission time at the high-speed rates at which optical networksare expected to operate. Slowly tunable devices, on the otherhand, have tuning times that can be significantly longer. As aresult, these devices cannot be assumed “tunable” at the mediaaccess level (i.e., for the purposes of scheduling packet trans-missions), as this requires fast tunability. Motivation for the useof slowly tunable lasers or filters is provided by two factors.First, they can be significantly less expensive than rapidly tun-able devices, making it possible to design lightwave network ar-chitectures that can be realized cost effectively. Second, the vari-ation in traffic demands is expected to take place over larger timescales (several orders of magnitude larger than a single packettransmission time). Hence, even very slow tunable devices willbe adequate for updating the WLA over time to accommodatevarying traffic demands.

Assuming an existing WLA and some information about thenew traffic demands, a new WLA, optimized for the new trafficpattern, must be determined. We considered this problem in [1],and we proposed an approach to reconfiguring the network thatis minimally disruptive to existing traffic. Specifically, we de-vised theGLPTalgorithm for obtaining a new WLA such thata) the new traffic load is balanced across the channels and b) thenumber of receivers that need to be retuned to take the networkfrom the old to the new WLA is minimized. The specificationsof GLPT include aknobparameter, which provides for tradeoffselection between load balancing and number of retunings. Interms of load balancing, the WLA obtained by GLPT is guaran-teed to be no more than two times away from the optimal one,in the worst case, regardless of the knob value used. GLPT alsoleads to a scalable approach to reconfiguring the network, sinceit tends to select the less utilized receivers for retuning, and sincefor certain values of the knob parameter the expected number ofretunings scales with the number of channels,not the number ofnodes in the network. For more details on the operation and per-formance of the GLPT algorithm, the reader is referred to [1].

During the reconfiguration phase, while the network makesa transition from one WLA to another, some cost is incurred interms of packet delay, packet loss, packet desequencing, and thecontrol resources involved in receiver retuning. Clearly, receiverretunings should not be very frequent, since unnecessary retun-ings affect the performance encountered by the users. Hence,it is desirable to minimize the number of network reconfigura-tions. However, postponing a necessary reconfiguration also hasadverse effects on the overall performance. Since the networkdoes not operate at an optimal point in terms of load balancing,it takes longer to clear a given set of traffic demands, causinglonger delays and/or buffer overflows, as well as a decrease inthe network’s traffic carrying capacity (refer also to the stabilitycondition in [11]). Similarly, if the decisions are made merely byconsidering the degree of load balancing, even tiny changes inthe traffic demands can lead to constant reconfiguration, therebysignificantly hurting network performance. Consequently, it isimportant to have a performance criterion that can capture theabove tradeoffs in an appropriate manner and allow their simul-taneous optimization.

In this paper, we develop a novel, systematic, and flexibleframework in which to view and contrast reconfiguration poli-cies. Specifically, we formulate the problem as a Markovian de-

cision process and show how an appropriate selection of rewardand cost functions can achieve the desired balance between var-ious performance criteria of interest. However, because of thehuge state space of the underlying Markov process, it is im-possible to directly apply appropriate numerical methods to ob-tain an optimal policy. We therefore develop an approximatemodel with a manageable state space, which captures the perti-nent properties of the original model. We also study the issuesthat arise during the transition phase and we present a class ofretuning strategies. Finally, we present a comprehensive evalu-ation of reconfiguration policies and retuning strategies throughboth analytical and simulation techniques.

In Section II, we present a model of the broadcast WDM net-work under study. In Section III, we formulate the reconfigura-tion problem as a Markovian decision process, and we discussthe issues involved in obtaining an optimal policy. In Section IV,we describe a class of retuning strategies for taking the networkfrom the old to the new WLA. We present numerical results inSection V and conclude this paper in Section VI.

II. THE BROADCAST WDM NETWORK

We consider a packet-switched single-hop lightwave networkwith nodes, and one transmitter-receiver pair per node. Thenodes are physically connected to a passive broadcast opticalmedium that supports wavelengths, . Boththe transmitter and the receiver at each node are tunable over theentire range of available wavelengths. However, the transmit-ters arerapidly tunablewhile the receivers areslowly tunable.We will refer to this tunability configuration, shown in Fig. 1,as rapidly tunable transmitter, slowly tunable receiver(RTT-STR). Although we will only consider RTT-STR networks inthis paper, we note that all our results can be easily adapted to thedual configuration, STT-RTR. Further, the results can be appliedto any WDM network where it is necessary to multiplex trafficfrom a large user population into a number of wavelengths, andthey are not limited to broadcast networks.

We represent the current traffic conditions in the network byan traffic demand matrix . Quantity couldbe a measure of the average traffic originating at nodeand ter-minating at node, or it could be the effective bandwidth [8] ofthe traffic from to . As traffic varies over time, the elementsof matrix will change. This variation in traffic takes place atlarger scales in time; for instance, we assume that changes inthe traffic matrix occur at connection request arrival or termi-nation instants. We also assume that the current matrixcom-pletely summarizes the entire history of traffic changes, so thatfuture changes only depend on the current values of the elementsof .

During normal operation, each of the slowly tunable receiversis assumed to be fixed to a particular wavelength. Let

be the wavelength currently assigned to receiver. A WLA is a partition of the set

of nodes, such that, is the subset of nodes currently receiving on

wavelength . This WLA is known to the network nodes, andit is used to determine the target channel for a packet, given thepacket’s destination. The network operates by having each node

Page 3: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 435

Fig. 1. A broadcast WDM network withN nodes andC channels.

employ a media access protocol, such as HiPeR-, that requirestunability only at the transmitting end. Nodes use HiPeR-tomake reservations, and can schedule packets for transmissionusing algorithms that can effectively mask the (relatively short)latency of tunable transmitters [10].

We now define thedegree of load balancing(DLB)for a network with traffic matrix operating under WLA as

(1)The right-hand side of (1) represents the bandwidth requirementof the dominant (i.e., most loaded) channel, while the secondterm in the left-hand side of (1) represents the lower bound, withrespect to load balancing, for any WLA for traffic matrix.Thus, the DLB is a measure of how far away WLAis from thelower bound. If , then the load is perfectly balanced andeach channel carries an equal portion of the offered traffic, whilewhen , the channels are not equally loaded. In other words,the DLB characterizes the efficiency of the network in meetingthe traffic demands denoted by matrixwhile operating underWLA : the higher the value of, the less efficient the WLA.

In order to more efficiently utilize the bandwidth of the op-tical medium as traffic varies over time, a new WLA may besought that distributes the new load more equally among thechannels. We will refer to the transition of the network fromone WLA to another asreconfiguration. In general, we assumethat reconfiguration is triggered by changes in the traffic matrix

. When such a change occurs, the following actions must betaken.

1) A new WLA for the new traffic matrix must be deter-mined.

2) A decision must be made on whether or not to reconfigurethe network by adopting the new WLA.

3) If the decision is to reconfigure, the actual retuning ofreceivers must take place.

The first issue was addressed in [1], where we developed theGLPT algorithm. It takes as input the current WLAand the

new traffic matrix , and determines the new WLA. Section IIIaddresses the problem of determining whether the changes intraffic conditions warrant the reconfiguration of the network tothe new WLA, and Section IV studies the issue of receiver re-tuning.

III. M ARKOV DECISION PROCESSFORMULATION

A. Reconfiguration Policies

We define the state of the network as a tuple . isthe current WLA, and is a matrix representing the prevailingtraffic conditions. Changes in the network state occur at instantswhen the matrix is updated. Recall that we have assumedthat future traffic changes only depend on the current valuesof the elements of , a reasonable assumption when the trafficis not characterized by long-range dependencies. Therefore, theprocess is a semi-Markov process. Let be the processembedded at instants when the traffic matrix changes. Then,is a discrete-time Markov process. Our formulation is in termsof the Markov process .

A network in state will enter state if thetraffic matrix changes to . Implicit in the state transition isthat the system makes a decision to reconfigure to WLA. Inorder to completely define the Markovian state transitions asso-ciated with our model, we need to establishnext WLAdecisions.The decision is a function of the current state and is denoted by

. Setting implies that if the systemis in state and the traffic demands change, the networkshould be reconfigured into WLA . Note that WLAcan be the same as, in which case the decision is not to recon-figure. Therefore, for each state , there are two alterna-tives: either the network reconfigures to WLA obtained bythe GLPT algorithm with and as inputs (in which case thenew state will be ) or it maintains the current WLA (inwhich case the new state will be ). The set of decisionsfor all network states defines areconfiguration policy.

To formulate the problem as a Markov decision process, weneed to specify reward and cost functions associated with eachtransition. Consider a network in state that makes a tran-

Page 4: Traffic adaptive WDM networks: a study of reconfiguration ...

436 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

sition to state . The network acquires animmediate ex-pected rewardequal to , where is a nonin-creasing function of , the DLB of WLA with re-spect to the new traffic matrix . Also, if , a recon-figuration costequal to is incurred, where isa nondecreasing function of the number of receivers that haveto be retuned to take the network to the new WLA. In otherwords, a switching cost is incurred each time the network makesa decision to reconfigure. We assume that the rewards and costsare bounded, i.e.,

(2)

where , , , and are real numbers.The problem is how to reconfigure the network sequentially

in time, so as to maximize the expected reward minus the recon-figuration cost over an infinite horizon. Let denotethe state of the network immediately after the-th transition,

Let also be the set of admissible policies. Thenetwork reconfiguration problem can then be formally stated asfollows (note that ).

Problem 3.1: Find an optimal policy that maximizesthe expected reward

(3)The first term on the right-hand side of (3) is the reward ob-

tained by using a particular WLA, and the second term is thecost incurred at each instant of time that reconfiguration is per-formed. The presence of a reward that increases as the DLBdecreases (i.e., as the load is better balanced across the chan-nels) provides the network with an incentive to associate witha WLA that performs well for the current traffic load. On theother hand, the introduction of a cost incurred at each reconfig-uration instant discourages frequent reconfigurations. Thus, theoverall reward function captures the fundamental tradeoff be-tween the DLB and frequent retunings involved in the reconfig-uration problem. In Section V, we motivate the above formula-tion by showing how an appropriate selection of reward and costfunctions yields various performance criteria of interest. Typi-cally, such selection can be based on either measurements of anexisting network or simulations.

For the case , the problem of finding an optimalpolicy is trivial, since it is optimal for the network to associatewith the WLA that best balances the offered load at each instantin time. This is because the evolution of the traffic matrixis notaffected by the network’s actions and reconfigurations are free.However, when , there is a conflict betweenfuturereconfiguration costs incurredandcurrent reward obtained,andit is not obvious as to what constitutes an optimal policy. We alsonote that as , the optimal policy would be to neverreconfigure, since this is the only policy for which the expectedreward in (3) would be nonnegative. Again, however, the point(i.e., the smallest value of ) at which this policy becomesoptimal is not easy to determine, as it depends on the transitionprobabilities of the underlying Markov chain.

Consider an ergodic, discrete-space discrete-time Markovprocess with rewards and a set of alternatives per state thataffect the probabilities and rewards governing the process.Howard’spolicy-iterationalgorithm [3] can be used to obtaina policy that maximizes the long-term reward in (3) for such aprocess. Initially, an arbitrary policy is specified from which allstate transition rates are determined. The algorithm then entersits basic iteration cycle, which consists of two stages. The firststage is thevalue-determination operation,which evaluatesthe current policy. In the second stage, thepolicy-improvementroutine uses a set of criteria to modify the decisions at eachstate and obtain a new policy with a higher reward than theoriginal policy. This new policy is used as the starting point forthe next iteration. The cycle continues until the policies in twosuccessive iterations are identical. At this point, the algorithmhas converged, and the final policy is guaranteed to be optimalwith respect to maximizing the reward in (3).

A difficulty in applying the policy-iteration algorithm to theMarkov process is that its running time per iteration is domi-nated by the complexity of solving a number of linear equationson the order of the number of states in the Markov chain. Evenif we restrict the elements of traffic matrixto be integers2 andimpose an upper bound on the values they can take, the poten-tial number of states is so large that the policy-iterationalgorithm cannot be directly applied to anything but networksof trivial size. In the next section, we show how to overcomethis problem by making some simplifying assumptions that willallow us to set up a new Markov process whose state space ismanageable.

B. Alternative Formulation

Consider a network in state , and a new traffic matrixfor which the WLA obtained with the GLPT algorithm is

. A closer examination of the reward function in (3) revealsthat the immediate reward acquired when the network makesa transition does not depend on the actual values of the trafficelements or the actual WLAs involved, but only on the values ofthe DLBs and and the distance .Thus, we make the simplifying assumption that the decision toreconfigure will also depend on the DLBs and the distance only.This is a reasonable assumption, since it is the DLB, not theactual traffic matrix or WLA, that determine the efficiency ofthe network in satisfying the offered load. Similarly, it is thenumber of retunings that determines the reconfiguration cost,not the actual WLAs involved.

Based on these observations, we now introduce a new processembedded, as Markov process, at instants when the trafficmatrix changes, as illustrated in Fig. 2. The state of this processis defined to be the tuple , where is the DLB achieved bythe current WLA with respect to the current traffic matrix and

is the number of retunings required if the network were toreconfigure. Transitions in the new process have the Markovianproperty, since they are due to changes in the traffic matrix that,in turn, are Markovian. However, as defined, the process is acontinuous-state process since, in general, the DLBis a real

2If the elements ofT are real numbers, thenM becomes a continuous-stateprocess and the policy-iteration algorithm cannot be applied.

Page 5: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 437

Fig. 2. State of the new Markov processM .

number. In order to apply Howard’s policy-iteration algorithm,we need a discrete-state process. We obtain such a process byusing discrete values for random variableas follows.

By definition [refer to (1)], the DLB can take any realvalue between zero and , where is the numberof channels in the network.3 We now divide the interval

into a number of nonoverlapping intervals, where and

are the lower and upper values of intervaland , , , and .Let denote the midpoint of interval. We now define a newdiscrete-state process with state . We will use state

to represent any state of the continuous-stateprocess such that . Clearly, the larger thenumber of intervals, the better the approximation.

Before we proceed, we make one further refinement to thenew discrete-state process . We note that the GLPT algo-rithm in [1] is an approximation algorithm for the load balancingproblem, and it guarantees that the DLB of the WLA obtainedusing the algorithm will never be more than 50% away fromthe degree of load balancing of the optimal WLA. The impor-tance of this result is as follows. Consider a network in whichthe traffic matrix changes in such a way that the current WLAprovides a DLB for the new traffic matrix such that .Based on the guarantee provided by algorithm GLPT, we cansafely assume that the load is well balanced and avoid a recon-figuration. This is because the network will incur a cost for re-configuring, without any assurance that the new DLB will beless than . Therefore, we choose to let , and there-fore the midpoint for the first interval is . We willcall any state a balancedstate since the offered load isbalanced within the guarantees of the GLPT algorithm.

We now specify decision alternatives, as well as rewardand cost functions, associated with each transition in the newprocess . Consider a network in state . At the instant

3The value� = 0 is achieved when the load is perfectly balanced across theC channels, in which case the expression on the right-hand side of (1) becomesequal to t =C. The value� = C � 1 corresponds to the worstcase scenario where one channel carries all the traffic; in this case, the right-handside of (1) becomes equal to t .

Fig. 3. Transitions and rewards out of state(� ;D) of processM underthe two decision alternatives (Note: the labels along the transitions representrewards,not transition probabilities).

the traffic matrix changes, the network has two options. Itmay maintain the current WLA, in which case it will makea transition into state , where is the DLB of thecurrent WLA with respect to the new traffic matrix andis the new distance. Or, it will reconfigure into a new WLA.In the latter case, the network will move into state ,since its new DLB is guaranteed to be less than 0.5. Whenthe network makes a transition into state , itacquires an immediate expected reward, which is equal to

. In addition, if is a balanced state (i.e., if ),a reconfiguration cost equal to is incurred.

The transitions out of state and the correspondingrewards are illustrated in Fig. 3. If the decision of the policy isnot to reconfigure, then the process will take one of the transi-tions indicated by the solid arrows in Fig. 3. Since the networkdoes not incur any reconfiguration cost, the immediate rewardacquired is a function of the new DLB in the new state. If, onthe other hand, the decision is to reconfigure, the transition outof state will always take the network to a balanced state

Page 6: Traffic adaptive WDM networks: a study of reconfiguration ...

438 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

with a DLB equal to . These transitions are shown in dottedlines in Fig. 3. A reconfiguration cost is incurred in this case,making the immediate reward equal to .

The new process is a discrete-space discrete-time Markovprocess with rewards and two alternatives per state, and we canuse the policy-iteration algorithm [3] to obtain an optimal policyoff-line and cache its decisions. The optimal policy decisionscan then be applied to a real network environment in the fol-lowing way (refer also to Fig. 2). Consider a network with trafficmatrix operating under WLA . Let be the new trafficmatrix and be the WLA constructed by algorithm GLPT [1],with and as inputs. Let also be the numberof receivers that need to be retuned to obtain WLAfromWLA . To determine whether the network should reconfigureto the new WLA , let be the current DLB for thenetwork, and suppose that falls within the th interval,

. By definition of the Markov process , the cur-rent network state is modeled by state of this process. If,under the optimal policy, the decision associated with this stateis to reconfigure, then the network must make a transition to thenew WLA ; otherwise, the network will continue operatingunder the current WLA .

We note that the discrete-space Markov process is anapproximation of the continuous-space process , since, asdiscussed above, in general the DLBis a real number betweenzero and 1. We also note that as the number of intervals

, the discrete-state process approaches the continuous-stateone. Therefore, we expect that as the number of intervalsincreases, the accuracy of the approximation will also increase,and the decisions of the optimal policy obtained through theprocess will “converge.” This issue will be discussed inmore detail in Section V, where numerical results to be presentedwill show that the decisions of the optimal policy “converge” forrelatively small values of . This is an important observationsince the size of the state space of Markov processincreasesexponentially with . By using a relatively small value for ,we can keep the state space of the process to a reasonable size,making it possible to apply the policy-iteration algorithm [3].

IV. THE TRANSITION PHASE AND RETUNING STRATEGIES

In this section, we assume that a decision to reconfigure thenetwork has been made according to the optimal policy de-scribed in the previous section, and that the new WLAhasbeen obtained through the GLPT algorithm [1]. We are, there-fore, concerned with thetransition phase,during which the ac-tual retuning of receivers that takes the network from the cur-rent WLA to the new WLA takes place. Clearly, in orderto complete the reconfiguration, a number of receivers equalto must be retuned. Aretuning strategydetermineswhen, and in what sequence, these receivers are taken off-linefor retuning.

While the receiver of, say, nodeis being retuned to a newwavelength, it cannot receive data, and thus, any packets sentto node are lost. If, on the other hand, the network nodes areaware that node is in the process of retuning its receiver, theycan refrain from transmitting packets to it. In this case, packetsdestined to node will experience longer delays while waiting

for the node to become ready for receiving again. Moreover,packets for arriving to the various transmitters during this timecannot be serviced and may cause buffer overflows. This in-crease in delay and/or packet loss during the transition phaseis the penalty incurred for reconfiguring the network. Once allreceivers have been retuned to their new wavelengths, the net-work has completed its reconfiguration and normal operationwill resume until a new reconfiguration is required.

There is a wide range of strategies for retuning the receivers,mainly differing in the tradeoff between the length of the tran-sition period and the portion of the network that becomes un-available during this period (see [6] for a discussion of similarissues in multihop networks). One extreme approach would beto simultaneously retune all the receivers that are assigned newchannels under . The duration of the transition phase is min-imized under this approach (it becomes equal to the receivertuning latency), but a significant fraction of the network maybe unusable during this time. At the other extreme, a strategythat retunes one receiver at a time minimizes the portion of thenetwork unavailable at any given instant during the transitionphase, but it maximizes the length of this phase (which now be-comes equal to the receiver tuning latency times the distance

). Between these two ends of the spectrum lie a rangeof strategies in which groups of two or more receivers are re-tuned simultaneously.

Let be the set of receivers that need to beswitched to new wavelengths in order to reconfigure the networkto WLA . In the most general sense, a retuning strategy isdefined as a partition of into , subsetsof receivers , such that time is associated with subset .Under this definition, a subset represents a group of receiversthat are simultaneously taken off-line for retuning. The retuningof group starts at time and lasts for a period of time equalto the receiver retuning latency, after which the receivers in thegroup become operational again. Without loss of generality, weassume that .

We distinguish betweenoverlapping and nonoverlappingstrategies. In the former, the retuning periods of two distinctgroups of receivers are allowed to overlap, while in the lattera group does not start retuning until after group hascompleted its retuning. Since it does not appear that overlap-ping strategies offer any advantages over nonoverlapping ones(in fact, they may complicate the management of the transitionphase), we only consider nonoverlapping strategies here.However, the number of partitions of the setcan potentiallybe very large, making it impractical to study all possiblenonoverlapping strategies. Hence, we restrict ourselves to theclass of parameterized nonoverlapping strategies described inFig. 4. Specifically, a family of strategies is characterized by aspecific value for parameter, which represents the maximumnumber of receivers in each group . When a group ofreceivers has completed its retuning, another group ofre-ceivers is immediately scheduled for retuning (unless there areless than receivers left to be retuned); this process is repeateduntil all receivers in have switched to their new wavelengthsunder WLA . We note that a wide range of strategies canbe obtained for the different values of .Furthermore, by varying the value of parameter, we hope to

Page 7: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 439

Fig. 4. A class of parameterized retuning strategies.

Fig. 5. Steps of a nonoverlapping retuning strategy withL = 2 for a network withN = 10,C = 3.

determine whether there exists a tradeoff between the numberof receivers that can be scheduled to retune at the same timeand the associated packet losses during the transition phase.

The steps taken by such a strategy during the transitionphase are illustrated in Fig. 5. It is assumed that a networkwith nodes and channels is to be reconfiguredfrom an initial to a final WLA (as shown in the figure) andthat . In this case, five receivers need to be retuned, and

. In the first step, receivers 3 and 4 are takenoff-line for retuning. When retuning is complete, receivers 3and 4 become operational again, and receivers 5 and 7 areselected for retuning. Finally, receiver 9 is retuned, at whichtime the transition to the new WLA is complete. Thus, in thisexample, the transition phase takes time equal to three timesthe receiver retuning latency. Note that whenever ,i.e., for any strategy that does not permit all receivers intoretune simultaneously, the network’s WLA undergoes a seriesof transformations that begins with the initial WLA and endswith the new balanced WLA . The WLAs between the initialand final ones (calledintermediateWLAs in Fig. 5) do notcontain all the receivers in the network, just those that are notretuning in the current step. By using small, incremental steps

during the transition phase, rather than shutting down a largepart of the network to simultaneously retune all the receiversin , we hope to minimize the negative effects on networkperformance while at the same time keeping the length of thetransition phase relatively small.

V. NUMERICAL RESULTS

We now study the reconfiguration policies and retuningstrategies introduced in previous sections using both analyticaland simulation techniques.

A. Optimal Reconfiguration Policies

In this section, we demonstrate the properties of the optimalpolicies obtained by applying the policy-iteration algorithm[3] to the Markov decision process developed in the previoussection. We also show how the optimal policy is affected bythe choice of reward and cost functions, and we compare thelong-term reward acquired by the network when the optimalpolicy is employed to the reward acquired by other policies.All the results presented in this section are for the approximateMarkov process with state space .

Page 8: Traffic adaptive WDM networks: a study of reconfiguration ...

440 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

Fig. 6. Near-neighbor model.

In this study, we consider anear-neighbortraffic model. Inother words, we make the assumption that if the network cur-rently operates with a DLB equal to and no reconfigurationoccurs, the next transition is more likely to take the network toa state with the same DLB or its two nearest neighborsand than to a DLB further away from . Specifically, weassume (4), shown at the bottom of the page. This traffic modelis illustrated in Fig. 6, which plots the conditional probability

that the next DLB will be , given that the currentDLB is , for intervals. The near-neighbor modelcaptures the behavior of networks in which the traffic matrix

changes slowly over time and abrupt changes in the traffic pat-tern have a low probability of occurring. As a result, a networkthat is perfectly balanced will not immediately become highlyunbalanced, and vice versa. We consider a different client-servermodel of communication in the next section (refer to Fig. 21) toillustrate that our approach can be used under a wide range oftraffic patterns.

Given the probabilities in (4), we let the conditional transitionprobability,when no reconfiguration occurs,from stateto state be equal to

(5)

where is the probability that retunings will be requiredin the next reconfiguration. The probabilities were measuredexperimentally, by running a large number of simulations usingthe near-neighbor model and recording the number of retuningsneeded at each reconfiguration instant. We also observed that theprobability that random variable takes on a particular valueis independent of the DLB ; thus (5).

We note that we need to obtain two different transition prob-abilities out of each state [3], one for each of the two possible

options: the do-not-reconfigure option and the reconfigure op-tion. The above discussion explains how to obtain the transitionprobability matrix for the do-not-reconfigure option. The tran-sition probability matrix for the reconfigure option is easy todetermine since we know that regardless of the valueof thecurrent state, the next state will always be a balanced state, i.e.,its DLB will be . The individual transition probabilities froma state to a state are then obtained by makingthe same assumption that the distribution ofis independentof the DLB . Therefore, the transition probabilities under thereconfigure option are

otherwise.(6)

1) Convergence of the Optimal Policy:Let us first considerthe following reward and cost functions:

(7)

where and are weights assigned to the rewards and costs.These reward and cost functions can reflect performance mea-sures such as throughput, delay, packet loss, or the control re-sources involved in receiver retuning. For example, a rewardfunction of the form may, depending on the valueof parameter , capture either the throughput or average packetdelay experienced while the network operates with a DLB equalto . On the other hand, using a cost that is proportional to thenumber of retunings (i.e., )can account for the control requirements for retuning the re-ceivers, or for the data loss incurred during reconfiguration. Fur-thermore, parameter can be chosen based on which of theretuning strategies discussed in Section IV is employed. Thus,network designers can select in a unified fashion appropriaterewards and costs to achieve the desired balance among the var-ious performance criteria of interest.

We apply Howard’s algorithm [3] to a network withnodes and wavelengths with a near-neighbor trafficmodel similar to the one shown in Fig. 6. Our objective is tostudy the effect that the number of intervals in the range[0, 1] of possible values of DLB has on the decisions ofthe optimal policy. As we mentioned in Section III-B, we expectthe decisions of the optimal policy to “converge” as .More formally, let be a real number such that ,and let be the interval in which falls when the total numberof intervals is . Also let be the decision of theoptimal policy for state of Markov process when

or

or

(4)

Page 9: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 441

Fig. 7. Optimal policy decisions forN = 20, C = 5,K = 20,A = 30, andB = 1.

Fig. 8. Optimal policy decisions forN = 20,C = 5,K = 30,A = 30, andB = 1.

the number of intervals is . We will say that the decisions ofthe optimal policy converge if

(8)

In Figs. 7–9, we plot the decisions of the optimal policy forthe 20-node five-wavelength network with a near-neighbortraffic model and for three different values of; the weightsused in the functions (7) were set to and . Fig. 7corresponds to the optimal policy for intervals, while

in Figs. 8 and 9, we increase to 30 and 40, respectively. Thehistograms shown in Figs. 7–9, as well as in other figures inthis section, should be interpreted as follows. In each figure, the

-axis represents the DLB (with a number of intervals equalto the corresponding value of ), while the -axis representsthe possible values of . The vertical bar at a particular DLBvalue has a height equal to such that

reconfiguredo not reconfigure

(9)

Page 10: Traffic adaptive WDM networks: a study of reconfiguration ...

442 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

Fig. 9. Optimal policy decisions forN = 20, C = 5,K = 40,A = 30,B = 1.

In other words, for each value of , there exists aretuningthresholdvalue such that the decision is to reconfigurewhen the number of receivers to be retuned is less than,and not to reconfigure if it is greater than . Since the op-timal policy had similar behavior for all the different reward andcost functions we considered, its decisions will be plotted as ahistogram similar to those in Figs. 7–9.4

As we can see in Figs. 7–9, the decisions of the optimal policydo converge [in the sense of (8)] as increases. For instance,let us consider a DLB of one, which falls in the fourth intervalwhen (in Fig. 7), the sixth interval when (inFig. 8), and the seventh interval when (in Fig. 9). Inall three cases, the retuning threshold is equal to nine for theseintervals; therefore, the decisions of the optimal policy for thethree values of are the same. On the other hand, for a DLBof two, the retuning threshold is 14 in Fig. 7 it drops to 13 inFig. 8, the same as in Fig. 9. In other words, for a DLB of two,the decisions of the optimal policy are different whenthan when or (in the former case, the decision isto reconfigure as long as the number of retunings is at most 14,while in the latter the decision is to reconfigure only when thenumber of retunings is at most 13). But the important observa-tion is that the policy decisions do not change when the number

of intervals increases from 30 to 40, indicating convergence.In fact, there are no changes in the optimal policy for values of

greater than 40 (not shown here). We have observed similarbehavior for a wide range of values for weightsand , for

4That the optimal policy was found to be a threshold policy (with a possiblydifferent retuning threshold) for each value of� can be explained by the factthat we only consider cost functions that are nondecreasing functions of randomvariableD. As a result, if the decision of the optimal policy for a state(� ;D )is not to reconfigure, intuitively one expects the decision for state(� ;D ),whereD > D , to also be not to reconfigure, since the reconfiguration cost�(D ) for the latter state would be at least as large as the reconfiguration cost�(D ) for the former

different network sizes, and for other reward and cost functions.These results indicate that a relatively small number of intervalsis sufficient for obtaining an optimal policy.

Another important observation from Figs. 7–9 is that the re-tuning threshold increases with the DLB values. This behaviorcan be explained by noting that because of the near-neighbordistribution (refer to Fig. 6), when the network operates at stateswith high DLB values, it will tend to remain at states with highDLB values. Since the reward is inversely proportional to theDLB value, the network incurs small rewards by making transi-tions between such states. Therefore, the optimal policy is suchthat the network decides to reconfigure even when there is alarge number of receivers to be retuned. By doing so, the net-work pays a high cost, which, however, is offset by the fact thatthe network makes a transition to the balanced state with a lowDLB, reaping a high reward. On the other hand, when the net-work is at states with low DLB, it also tends to remain at suchstates where it obtains high rewards. Therefore, the network isless inclined to incur a high reconfiguration cost, and the re-tuning threshold for these states is lower.

2) The Effect of Reward and Cost Functions:InFigs. 10–12, we apply Howard’s algorithm to a networkwith nodes and wavelengths, operatingunder a near-neighbor model similar to the one shown inFig. 6. For this network we used intervals, and wevaried weights and in the reward and cost functions in (7)to study their effect on the optimal policy. Specifically, we let

, and we varied from 20 (in Fig. 10) to 35 (in Fig. 11)to 50 (in Fig. 12). We first observe that the optimal policyis again a threshold policy for each value of the DLB.However, as increases, we see that the retuning thresholdassociated with each DLB value also increases. This behaviorof the optimal policy is in agreement with intuition since, byincreasing , we increase the reward obtained by taking the

Page 11: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 443

Fig. 10. Optimal policy decisions forN = 100,C = 10,K = 20,A = 20, andB = 1.

Fig. 11. Optimal policy decisions forN = 100,C = 10,K = 20,A = 35, andB = 1.

network to a balanced state relative to the cost of reconfigu-ration, making reconfigurations more attractive. Similarly, ifwe keep constant and increase (a case not shown here),reconfiguring the network becomes less desirable, and thus theretuning threshold associated with each DLB value decreases.Overall, in our study, we have found that one can obtain awide variety of policies by varying the values of weightsand . It is up to the network operator to decide what valuesto use, and thus to make the network more or less sensitive totraffic changes (i.e., more or less likely to reconfigure).

We now proceed to study the effect of different reward andcost functions. One performance measure of interest is theprobability of unnecessary reconfigurations. By makingand large, and letting be a slowly decreasing func-tion as increases, minimizing the probability of unnecessaryreconfigurations becomes equivalent to maximizing (3). Simi-larly, the objective to minimize the probability that the portionof the network that becomes unavailable due to reconfigurationis greater than a certain threshold can be achieved byletting be small, letting be large, and selecting

Page 12: Traffic adaptive WDM networks: a study of reconfiguration ...

444 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

Fig. 12. Optimal policy decisions forN = 100,C = 10,K = 20,A = 50, andB = 1.

Fig. 13. Cost function�(D) used for the policy shown in Fig. 14.

to be a function as shown in Fig. 13 (where ). Wenow consider the latter cost function plotted in Fig. 13,while the reward function is as in (7) with . In Fig. 14,we show the decisions of the optimal policy for a networkwith , , and a near-neighbor traffic modelwhen . As we can see, the retuning threshold neverexceeds the value . Therefore, the network willnever reconfigure when the number of retunings is greaterthan 30, as expected.

Another important performance objective is to minimize theprobability that the network will not be able to handle the of-fered traffic load. This is equivalent to minimizing the proba-bility that the DLB increases beyond a maximum value .Let be the maximum traffic load (in packets per packettransmission time) that will ever be allowed into the network.By definition of the DLB in (1), the load offered to the domi-nant channel when the DLB iswill be 1 . Sinceeach channel can clear at most one packet per packet time, we

Page 13: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 445

Fig. 14. Optimal policy decisions forN = 100, C = 10,K = 20, andA = 50.

Fig. 15. Reward function�(�) used for the policy shown in Fig. 16.

have that 1 . Therefore, this objective canbe achieved by selecting a function, as shown in Fig. 15(where ) and letting be small. Thus, we alsoobtained the optimal policy for the same network as above, butwith the reward function shown in Fig. 15; and a cost function

, with . The resulting policy is shown inFig. 16, where we can see that the retuning threshold is 100 forDLB values greater than 4.5. Since the maximum number of re-ceivers that will ever need to be retuned is [1],

a retuning threshold equal to 100 means that the network willalways reconfigure when the DLB becomes greater than 4.5.Thus, although the network is not prohibited from entering astate with a DLB value greater than 4.5, once doing so, in thevery next transition the network will reconfigure and will enterthe balanced state. Subsequently, because of the nature of thenear-neighbor traffic model, the network will tend to stay atstates with low DLB values. In effect, therefore, the probabilitythat the network will be operating at states with DLB values

Page 14: Traffic adaptive WDM networks: a study of reconfiguration ...

446 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

Fig. 16. Optimal policy decisions forN = 100,C = 10,K = 20, andB = 1.

greater than 4.5 is very small when the reward function in Fig. 15is used.

3) Comparison to Threshold Policies:In this section, wecompare the optimal policy against three classes of threshold-based policies.

1) DLB-threshold policies.There exists a threshold DLBvalue such that if the system is about to make a tran-sition into a state , then the networkwill reconfigure and make a transition to a state with DLB

, regardless of the reconfiguration cost involved. Oth-erwise, no reconfiguration occurs. This class of policiesis not concerned with the reconfiguration cost incurred.Instead, it ensures that the traffic-carrying capacity of thenetwork will never fall below the value

.2) Retuning-threshold policies.This class of policies is in a

sense a “dual” of the previous one, in that decisions arebased solely on the number of retunings involved in thereconfiguration, not on the DLB. Specifically, if the net-work is about to make a transition, then the network willreconfigure only if the number of receivers that mustbe retuned is less than or equal to a threshold . If

, no reconfiguration takes place. This class ofpolicies ensures that the portion of the network that be-comes unavailable due to reconfiguration never exceeds

.3) Two-threshold policies.This class of policies attempts

to combine the objectives of the two classes of policiesabove. Specifically, there are two thresholds and

. If the system is about to make a transition into astate , then the network will reconfigure if

. Otherwise, if , the network will recon-figure if the number of receivers that must be retuned

is less than or equal to , and it will not reconfigure if. We note that if we let (i.e.,

the maximum number of receiver that will ever need tobe retuned [1]), these policies reduce to the class of DLB-threshold policies. Similarly, if we let (i.e.,the DLB threshold is equal to the maximum DLB value),these policies become simple retuning threshold policies.Therefore, the two-threshold policies are the most gen-eral class of policies and include the DLB-threshold andretuning-threshold policies as special cases.

The DLB-threshold and the general two-threshold policiesabove define Markov processes that areoutsidethe class of Mar-kovian decision processes considered in Section III. In a Mar-kovian decision process, there are several alternatives per state.But once an alternative has been selected for a state, then tran-sitions from this state are always governed by the chosen alter-native (refer also to Fig. 3). In a DLB-threshold policy, on theother hand, the alternative selected does not depend on thecur-rentstate, but rather on thenextstate. Therefore, the system mayselect different alternatives when at a particular state, dependingon what the next state is,5 and similarly for the two-thresholdpolicies. Since Howard’s algorithm [3] is optimal only withinthe class of Markovian decision processes, it is possible thatthese threshold policies obtain rewards higher than the optimalpolicy determined by the algorithm. Retuning-threshold poli-cies, however, are such that there is a unique alternative per state,so we expect them to perform no better than the optimal policy.6

5If the next state is one with a DLB less than the threshold, the alternativeselected is not to reconfigure; otherwise the alternative selected is to reconfigure.

6As we have seen, the optimal policies are in fact threshold policies with adifferent retuning threshold for each DLB value. Therefore, the optimal policywill in general perform better than a retuning policy with the same threshold forall DLB values.

Page 15: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 447

Fig. 17. Policy comparison,N = 100,C = 10,K = 20,A = 50, andB = 1.

All the results presented in this section are for a networkwith nodes, wavelengths, a near-neighbortraffic model, and intervals. The reward and cost func-tions considered are those in (7). In Fig. 17, we compare theoptimal policy obtained by Howard’s algorithm [3] to a numberof retuning-threshold policies. The figure plots the averagelong-term reward acquired by each of the policies against theretuning threshold . The horizontal line corresponds tothe reward of the optimal policy, which, clearly, is independentof the retuning threshold. Each point of the second line inthe figure corresponds to the reward of a retuning-thresholdpolicy with the stated threshold value. As we can see, re-tuning-threshold policies obtain a reward that is significantlyless than that of the optimal policy, as expected. Furthermore,the reward of retuning-threshold policies varies depending onthe actual threshold used. Since the best threshold depends onsystem parameters such as the traffic patterns and the rewardand cost functions and the associated weights, it is impossibleto know the best threshold to use unless one experiments witha large number of threshold values.

In Fig. 18, we compare the optimal policy to a DLB-thresholdpolicy and a number of two-threshold policies. For these results,we used and as the values for the weights in thereward and cost functions, respectively, in (7). This time, weplot the reward of each policy against the DLB threshold value;similar to Fig. 17, the optimal policy is independent of the DLBthreshold, resulting in a horizontal line in Fig. 18. We also plotthe reward of DLB-threshold policies with varying DLB thresh-olds and of a family of two-threshold policies. Each of the three

plots of two-threshold policies corresponds to a different re-tuning threshold (namely, , , and ) and varyingDLB thresholds. Also, recall that the DLB-threshold policy isequivalent to a two-threshold policy with a retuning thresholdequal to .

The most interesting observation from Fig. 18 is that, forcertain values of the DLB-threshold, the DLB-threshold policyand the two-threshold policy with retuning threshold

achieve a higher reward than the optimal policy obtainedthrough Howard’s algorithm. This result is possible because, aswe discussed earlier, the class of two-threshold policies is moregeneral than the class of policies for which Howard’s algorithmis optimal. On the other hand, we note that the reward of theDLB-threshold policy depends strongly on the DLB thresholdused and that the reward of the two-threshold policies dependson the values of both thresholds. Although within a certain rangeof these values the threshold policies perform better than the op-timal policy, the latter outperforms the former for most thresholdvalues. Therefore, threshold selection is of crucial importancefor the threshold policies, but searching through the thresholdspace can be expensive. The optimal policy, however, guaran-tees a high overall reward and is also simpler to implement sincethe network does not need tolook aheadto the next state to de-cide whether or not to reconfigure.

Fig. 19 is similar to Fig. 18 in that we again compare theoptimal policy against a DLB-threshold and two-thresholdpolicies. For these experiments, however, we have usedand in the reward and cost functions, respectively, of (7).As we can see, the reward of the optimal policy is strictly higher

Page 16: Traffic adaptive WDM networks: a study of reconfiguration ...

448 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

Fig. 18. Policy comparison,N = 100,C = 10,K = 20,A = 50, andB = 1.

than that of threshold policies across all possible thresholdvalues. These results demonstrate that DLB- or two-thresholdpolicies do not always perform better than the optimal policy,and their performance depends on the system parameters and/orthe reward and cost functions. Furthermore, it is not possibleto know ahead of time under what circumstances the thresholdpolicies will achieve a high reward. Equally important, if thenetwork’s operating parameters change, threshold selectionmust be performed anew, since, for instance, the DLB thresholdthat maximizes the reward of the DLB-threshold policy inFig. 18 results in very poor performance in Fig. 19, and viceversa.

Overall, the results presented in this section demonstrate thatthe optimal policy obtained through Howard’s algorithm cansuccessfully balance the two conflicting objectives, namely, theDLB and the number of retunings, and always achieves a highreward across the whole range of the network’s operating pa-rameters. We have also shown that by appropriately selectingthe reward and cost functions, the optimal policy can be tai-lored to specific requirements set by the network designer. Onthe other hand, pure threshold policies, although they can some-times achieve high reward, are less flexible, and they intro-duce an additional degree of complexity, namely, the problemof threshold selection.

B. Simulation Study

The objective of our simulation study is twofold. First, wewant to demonstrate the benefits of dynamic reconfiguration in

terms of specific performance measures such as packet delayand packet loss. Second, we want to evaluate the various re-tuning strategies of Section IV by studying the effect of param-eter in the algorithm shown in Fig. 4. To this end, we havedeveloped a simulator of a WDM network that implements thereconfiguration policies and retuning strategies described ear-lier. In the following, we present the most important features ofour simulator; for a detailed description, the reader is referredto [2].

Each channel in the network is modeled as an OC-48 (2.48Gb/s) link. The traffic between nodes in the network followsa client-server model of communication. Each node is eithera server or a client, but the number of servers is considerablysmaller than the number of clients. Each client has two types oftraffic sources. The first source creates the background trafficin the network by generating packets to other client nodes; thearrival rate of this traffic is relatively low. The second sourcemodels the communication of this client with its current server,and its arrival rate is significantly higher than that of the back-ground source. A source, regardless of its type, is implementedfollowing the bursty source model recommended by the ATMForum for simulating VBR-rt traffic. The packet size in the net-work was taken to be equal to the length of an ATM cell. Thenodes access the various channels using the HiPeR-MAC pro-tocol [11], which in turn uses the algorithms in [10] to schedulepackets for transmission.

We created variations in the traffic pattern by using the fol-lowing technique. Initially, each client node is associated withone particular server. Periodically, the assignment of clients to

Page 17: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 449

Fig. 19. Policy comparison,N = 100,C = 10,K = 20,A = 20, andB = 1.

servers is modified to reflect changes in the needs of the applica-tions running at the client (for instance, a client may access a fileserver to edit a file and then a Web server to fetch a Web page).This change in client-server assignments creates a shift in thetraffic pattern and may result in an unbalanced network whereone or more channels carry a large fraction of the overall traffic.Network nodes accumulate traffic data for each source–destina-tion pair over 2 ms intervals, using the in-band control packetsof HiPeR- [11]. These data are used to compute the currentDLB of the network and to determine whether reconfigurationis needed. As a result, successive reconfigurations are alwaysspaced at least 2 ms apart. However, no necessary reconfigura-tions were ever delayed, since in our model each client-serverassignment lasts for at least 20 ms.7

We collected experimental traffic data for the client-servercommunication model, from which we built a conditional dis-tribution of DLB values for intervals. This conditionaldistribution is plotted in Fig. 20. As we can see,this distribution is quite different from the near-neighbor dis-tribution of Fig. 6. Using the reward and cost functions in (7),the optimal policy obtained by Howard’s algorithm is shown inFig. 21. This optimal policy was used throughout this section.

For the purposes of this study, we consider a WDM networkwith nodes, of which seven are servers and 63 are

7While a change in the client-server assignment is likely to create a com-pletely new traffic pattern and lead to reconfiguration, the behavior of individualsources within a given assignment also causes (smaller) changes in the overalltraffic. The accumulation of a large number of these changes may create theneed for a reconfiguration within a given client-server assignment.

Fig. 20. Conditional distribution of DLB values in the client-server model.

clients. There are channels in the network with an ag-gregate capacity of 12.4 Gb/s. The fast tunable transmitters areassumed to take two packet times (342 ns for 53-byte packetsand OC-48 speeds) to switch wavelengths. Each simulation runterminates when a total of 10packets are successfully receivedby their respective destinations. In the following, we describetwo sets of experiments. First, we compare the average packetdelay and packet loss in networks with and without dynamic re-configuration. Second, we compare various retuning strategiesin order to determine the effect of several important parameterson network performance.

Page 18: Traffic adaptive WDM networks: a study of reconfiguration ...

450 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

Fig. 21. Optimal reconfiguration policy calculated for the client-server traffic pattern.

Fig. 22. Average packet delay in networks with and without reconfiguration capabilities.

1) Benefits of Dynamic Network Reconfiguration:Westudy the performance of the network with and without recon-figuration capabilities. When no reconfiguration is allowed, theassignment of receivers to wavelengths (WLA) remains fixedthroughout the simulation (a static network). In a dynamic

network, reconfigurations are governed by one of two policies:the optimal MDP policy depicted in Fig. 21 or a DLB-thresholdpolicy (see Section V-A3) with a DLB threshold equal to 0.8.For either policy, the retuning strategy in Fig. 4 withis used, i.e., all receivers needing retuning are simultaneously

Page 19: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 451

Fig. 23. Packet loss in networks with and without reconfiguration capabilities.

retuned in one step. The receiver tuning latency is assumed tobe equal to 1000 packet times (i.e., three orders of magnitudegreater than the transmitter latency), while the buffer capacityat each network node is fixed at 200 packets per channel.

In Fig. 22, we plot the average packet delay for each net-work as a function of total network throughput. As we can see,the networks with a dynamic reconfiguration capability clearlyoutperform the one without such capability. As the load ap-proaches and exceeds 50% of the total channel capacity (equalto 12.4 Gb/s), the delay in the static network increases veryrapidly to nearly 800 s. On the other hand, the average packetdelay in the dynamically reconfigurable networks grows muchslower and remains under 100s for the loads shown. Also,the network using the optimal reconfiguration policy achieveslower average packet delays than those of the network using theDLB-threshold policy. These results can be explained by the sta-bility condition derived in [11], which states that the maximumsustained traffic load is inversely proportional to the averageDLB of the network. By periodically rebalancing the traffic loadacross the channels, reconfigurable networks maintain a signif-icantly lower average DLB than static networks, allowing themto accommodate higher traffic loads. Furthermore, the optimalpolicy achieves a lower average DLB than the DLB-thresholdpolicy, resulting in lower delays.

In Fig. 23, we plot the percentage of packets lost in the threenetworks. As we can see, at low loads, the static network in-curs no losses, while the reconfigurable networks do experiencesome packet loss. This loss is not due to buffer overflows; in-stead, it takes place during the transition phase due to recon-

figuration. Recall that when a receiver is to be retuned, packetscurrently buffered for it at the various nodes are discarded sincethey are buffered for transmission on the wrong (old) wave-length. Thus, this loss represents the penalty incurred for havinga network that is traffic-adaptive. However, at higher loads, thestatic network experiences quite high losses due to buffer over-flows. Since the WLA is fixed in a static network, when thetraffic patterns change such that one channel carries a large partof the overall traffic, buffers for this channel quickly fill up inall network nodes. Reconfigurable networks, however, period-ically rebalance the traffic load, preventing unnecessary bufferoverflows. As a result, at higher loads, packet losses are signif-icantly lower than in the static network. Again, packet loss islowest when the optimal policy is used; at the highest load inFig. 23, the packet loss with the optimal policy is one order ofmagnitude lower than the loss in the static network.

Fig. 24, which shows the maximum buffer occupancy forthe various traffic loads, provides further support to theseconclusions. At low loads, buffer occupancy levels for thestatic network are low, but at high loads maximum bufferoccupancy equals the buffer capacity, indicating packet lossdue to overflows. On the other hand, buffer occupancy for thereconfigurable networks is low at low loads, confirming thatthe losses shown in Fig. 23 are indeed due to reconfiguration.Furthermore, the maximum buffer occupancy when the optimalpolicy is used is quite low and never equals the buffer capacityfor the range of loads shown. This fact explains the low delaysin Fig. 22 and indicates that this network never loses anypackets due to buffer overflows and can accommodate even

Page 20: Traffic adaptive WDM networks: a study of reconfiguration ...

452 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

Fig. 24. Maximum buffer occupancy in networks with and without reconfiguration capabilities.

higher loads. When the DLB-threshold policy is used, themaximum buffer occupancy is higher overall and equals thecapacity for the highest load shown.

We conclude that under moderate-to-high traffic loads, theperformance of a WDM network, in terms of both averagepacket delay and packet loss, would significantly benefit fromhaving a dynamic reconfiguration capability. Furthermore, thisimprovement in performance is achieved with a decrease in theamount of buffer space required. These results clearly demon-strate that by adapting itself to prevailing traffic conditions,the novel RTT-STR architecture introduced in this paper canprovide a cost-effective solution to building high-performanceWDM networks. They also illustrate the importance of em-ploying an optimal reconfiguration policy, since the networkusing the optimal policy clearly outperforms the one with theDLB-threshold policy.

2) Comparison of Retuning Strategies:We now comparethe various retuning strategies in the class shown in Fig. 4in terms of the same performance metrics considered in theprevious section, namely, average packet delay, packet loss, andmaximum buffer occupancy. In our study, we only consider areconfigurable network employing the optimal policy of Fig. 21and operating under an offered load of 7 Gb/s (the maximumload shown in Figs. 22–24). We have obtained results for threevalues of the receiver tuning latency, corresponding to 1000,2000, and 5000 packet transmission times, in order to study theeffect of this important parameter on the performance of thevarious retuning strategies. For the experiments presented in

this section, the buffer capacity at each node was set to such ahigh value that there were never any buffer overflows. There-fore, all the losses shown were entirely due to the transitionphase during reconfiguration.

In Figs. 25–27, we plot the average packet delay, packet loss,and maximum buffer occupancy, respectively, as a function ofparameter (recall that in Fig. 4, corresponds to the max-imum number of receivers that can be simultaneously retuned inone step of the transition phase). As we can see, as the receivertuning latency increases, all three performance measures arenegatively affected. This behavior can be explained by notingthat the longer it takes each receiver to retune, the longer agroup of receivers will be off-line during the transition phaseand the longer it will take to reconfigure the network for a givenvalue of . While a receiver is not operational, newly arrivingpackets destined for it have to be buffered. Thus, for higher re-tuning latency values, the maximum buffer occupancy increases(Fig. 27) leading to higher losses (Fig. 26) as well as higher de-lays (Fig. 25). We also note, however, that within the range ofvalues for for which the best performance is observed overall three measures (i.e., ), the curves correspondingto a tuning latency of 2000 are very close to those for a latencyof 1000, and even the curves for a latency of 5000 are not sig-nificantly different. Furthermore, the average packet delay, thepacket loss, and the maximum buffer occupancy for a latency of5000 and are lower than the corresponding values for aDLB-threshold policy in Figs. 22–24. This result indicates thatif an appropriate value for is used, then even very slow (and,

Page 21: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 453

Fig. 25. Average packet delay comparison of retuning strategies.

consequently, inexpensive) tunable devices can be adequate forreaping the benefits of reconfiguration.

Finally, from the three figures, we observe that the two ex-treme strategies obtained for large values of(i.e., simultane-ously tuning all required receivers at once) and (i.e.,tuning one receiver at a time) do not perform as well as otherstrategies obtained for values ofbetween one and . We canexplain this result by considering each of the two extreme strate-gies separately. When all of the receivers are allowed to retuneat the same time, the network takes the shortest possible timeto reconfigure to an optimal WLA; however, during this time alarge number of receivers are not operational. As a result, eventhough average packet delay is very low due to a short transi-tion phase, packet losses are relatively high. When the networkis only allowed to retune the receivers one at a time, the transi-tion phase is the longest among all strategies. Consequently, thenetwork operates under a suboptimal WLA for a long time, andperformance suffers in terms of both packet delay and loss. Theresults in the figures indicate that strategies with medium valuesfor (specifically, ) achieve the best overall per-formance.

VI. CONCLUDING REMARKS

We have conducted an in-depth study of the reconfigurationproblem in traffic-adaptive WDM networks. Our objective wasto investigate three open issues: how frequently to reconfigurethe network, how to structure the reconfiguration phase, andhow to quantify the benefits of reconfiguration to the networkin terms of measurable performance parameters.

In order to address the first issue, we developed a for-mulation based on Markov decision process theory. Givensome information regarding the traffic patterns in the net-work (which can be obtained empirically), the formulationallows us to apply existing algorithms (specifically, Howard’spolicy-iteration algorithm) to obtain (off-line) optimal dy-namic reconfiguration policies. These policies can then beused during the operation of the network (on-line) to deter-mine the instants at which reconfiguration must take place.The policies can be fine-tuned to strike the desired balancebetween two important but conflicting objectives: the degreeof load balancing (which, if considered in isolation, would beoptimized by reconfiguring any time there is a slight changein the traffic pattern) and the degree of network unavailabilityas captured by the number of transceiver retunings (whichwould be optimized by never reconfiguring the network). Ourformulation provides two distinct tools for fine-tuning thepolicies: the cost functions (which can be selected to reflectspecific performance measures) and the weights applied toeach objective.

For the second issue, we presented a class of parameter-ized retuning strategies for carrying out the reconfigurationphase. Under these strategies, the transceivers to be retuned aregrouped into sets, and the transceivers in each set are simultane-ously taken off-line for retuning. Using simulation, we exploredthe tradeoffs between the length of the reconfiguration phase(which would be kept short if all transceivers were grouped ina single set) and the portion of network resources that becomeunavailable at any given time (which would be optimized if

Page 22: Traffic adaptive WDM networks: a study of reconfiguration ...

454 JOURNAL OF LIGHTWAVE TECHNOLOGY, VOL. 19, NO. 4, APRIL 2001

Fig. 26. Packet loss comparison of retuning strategies.

Fig. 27. Maximum buffer occupancy of retuning strategies.

each set consists of a single transceiver). While the relativeperformance of the strategies is affected by the transceiver

tuning latency, our results demonstrate that neither of the twoextreme strategies is optimal. Our findings indicate that the

Page 23: Traffic adaptive WDM networks: a study of reconfiguration ...

BALDINE AND ROUSKAS: TRAFFIC ADAPTIVE WDM NETWORKS 455

strategies with the best overall performance are those that arein the middle of the spectrum between the two extremes.

Finally, using simulation, we have quantified the benefits ofreconfiguration in terms of important performance measuressuch as average packet delay and packet loss. Specifically, wehave observed that at moderate-to-high traffic loads, a networkthat employs our optimal reconfiguration policies significantlyoutperforms static networks (no reconfiguration) or networksthat apply simple threshold-based reconfiguration policies.

Overall, our work demonstrates that by employing slowlytunable devices, it is possible to build traffic-adaptive high-per-formance multiwavelength networks cost-effectively. While inthis paper we have not considered the effect of equipment fail-ures or signal impairments, reconfiguration can also be invokedas a tool to recover from anomalous conditions. The issue of de-veloping reconfiguration policies for recovery from failures/im-pairments is a topic of future research.

REFERENCES

[1] I. Baldine and G. N. Rouskas, “Dynamic load balancing in broadcastWDM networks with tuning latencies,” inProc. INFOCOM ’98, Mar.1998, pp. 78–85.

[2] I Baldine, “Dynamic reconfiguration in broadcast WDM networks,”Ph.D. dissertation, North Carolina State University, Raleigh, 1998.

[3] R. A. Howard, Dynamic Programming and Markov Pro-cesses. Cambridge, MA: MIT Press, 1960.

[4] J.-F. P. Labourdette, “Traffic optimization and reconfiguration man-agement of multiwavelength multihop broadcast lightwave networks,”Comput. Networks ISDN Syst., 1998, to be published.

[5] J.-F. P. Labourdette and A. S. Acampora, “Logically rearrangeablemultihop lightwave networks,”IEEE Trans. Commun., vol. 39, pp.1223–1230, Aug. 1991.

[6] J.-F. P. Labourdette, F. W. Hart, and A. S. Acampora, “Branch-exchangesequences for reconfiguration of lightwave networks,”IEEE Trans.Commun., vol. 42, pp. 2822–2832, Oct. 1994.

[7] B. Mukherjee, “WDM-Based local lightwave networks Part I:Single-hop systems,”IEEE Network Mag., pp. 12–27, May 1992.

[8] H. G. Perros and K. M. Elsayed, “Call admission control schemes: Areview,” IEEE Commun. Mag., vol. 34, no. 11, pp. 82–91, 1996.

[9] G. N. Rouskas and M. H. Ammar, “Dynamic reconfiguration in multihopWDM networks,”J. High Speed Networks, vol. 4, no. 3, pp. 221–238,1995.

[10] G. N. Rouskas and V. Sivaraman, “Packet scheduling in broadcast WDMnetworks with arbitrary transceiver tuning latencies,”IEEE/ACM Trans.Networking, vol. 5, pp. 359–370, June 1997.

[11] V. Sivaraman and G. N. Rouskas, “A reservation protocol for broadcastWDM networks and stability analysis,”Comput. Networks, vol. 32, no.2, pp. 211–227, Feb. 2000.

Ilia Baldine was born in Dubna, Moscow Region, Russia in 1972. He receivedthe B.S. degree in Computer Science from the Illinois Institute of Technology,Chicago, in 1993, and the M.S. and Ph.D. degrees in Computer Science fromthe North Carolina State University in 1995 and 1998, respectively.

He currently works as a Network Research Engineer with the Advanced Net-working Research group at MCNC, a non-profit research corporation located inResearch Triangle Park, North Carolina. His research interests include all-op-tical networks, ATM networks, network protocols and security.

George N. Rouskas (S’92–M’95) received theDiploma in electrical engineering from the NationalTechnical University of Athens (NTUA), Athens,Greece, in 1989 and the M.S. and Ph.D. degrees incomputer science from the College of Computing,Georgia Institute of Technology, Atlanta, in 1991and 1994, respectively.

He joined the Department of Computer Science,North Carolina State University in August 1994 andhe has been an Associate Professor since July 1999.In summer 2000, he was an Invited Professor at the

University of Evry, France, and during the 2000-2001 academic year he is onsabbatical leave at Vitesse Semiconductor. His research interests include net-work architectures and protocols, optical networks, multicast communication,and performance evaluation.

Dr. Rouskas is a recipient of a 1997 NSF Faculty Early Career Development(CAREER) Award, and a coauthor of a paper that received the Best PaperAward at the 1998 SPIE conference on All-Optical Networking. He alsoreceived the 1995 Outstanding New Teacher Award from the Department ofComputer Science, North Carolina State University, and the 1994 GraduateResearch Assistant Award from the College of Computing, Georgia Tech.He was a Co-Guest Editor for the IEEE JOURNAL ON SELECTED AREAS IN

COMMUNICATIONS, Special Issue on Protocols and Architectures for NextGeneration Optical WDM Networks, which appeared in 2000, and is on theeditorial boards of the IEEE/ACM TRANSACTIONS ON NETWORKING and theOptical Networks Magazine. He is a member of the ACM and of the TechnicalChamber of Greece.