Artigo Do Periodico

11
Localizing link failures in all-optical networks using monitoring tours Sangman Cho, Srinivasan Ramasubramanian Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721, United States article info Article history: Received 5 March 2013 Accepted 29 July 2013 Available online 2 September 2013 Keywords: Failure localization Link failures Monitoring cycles Monitoring paths Monitoring tours abstract In this paper, we introduce the concept of monitoring tours (m-tours) to uniquely localize all possible failures up to k links in all-optical networks. We establish paths and cycles that can traverse the same link at most twice (forward and backward) and call them m-tours. An m-tour is different from other existing schemes such as m-cycle and m-trail, which tra- verse a link at most once. Closed (open) m-tours start and terminate at the same (distinct) monitor location(s). Each tour is constructed such that any shared risk linked group (SRLG) failure results in the failure of a unique combination of closed and open m-tours. We prove that k-edge connectivity is a sufficient condition to localize all SRLG failures with up to k-link failures when only one monitoring station is employed. We introduce an integer lin- ear program (ILP) and a greedy scheme to find the monitoring locations to uniquely localize any SRLG failures with up to k links. We provide a heuristic scheme to compute m-tours for a given network. We demonstrate the validity of the proposed monitoring method through simulations. We show that our approach using m-tours significantly reduces the number of required monitoring locations compared to previously developed techniques. Ó 2013 Elsevier B.V. All rights reserved. 1. Introduction The high data rates offered by optical transmission technology has increased the number of multimedia and interactive applications over the Internet in the past few years. However, the increased data rate does also increase the amount of data lost due to temporary service disrup- tion caused by fiber cuts or component failures. Therefore, fault detection and localization becomes one of the most important issues in a network. Although single link failures are more common, multiple link failures occur due to shared risks. Such risks include the routing of fibers through the same duct, failure of a link while another link is under maintenance, or natural disasters that cause links traversing a region to fail. Fault detection and localization may be performed at the physical layer by employing optical power detection and optical spectrum analysis [1,2]. To detect faults, monitors placed at several network locations generate an alarm whenever a fault occurs. By observing the generated alarms, the precise location of the fault may be identified. Several researchers have developed methods for localizing failures by observing monitor alarms generated by moni- tors [3–7]. Conventional link based monitoring schemes need one monitor at each communication link. In [8], an adaptive technique for fault diagnosis using ‘‘probes’’ was presented. According to this scheme, probes are established sequentially, each time using information about already established probes. While sequential probing helps achieve adaptiveness, it also increases the fault localization time. In [9], a non-adaptive fault diagnosis approach was developed based on establishing a set of probes. The techniques pre- sented in [8,9] assume that any node can generate and ter- minate (analyze) a probe. Thus, any node could be a monitor. As monitoring of signals in the optical domain is 1389-1286/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.comnet.2013.07.018 Corresponding author. Tel.: +1 5206214521. E-mail addresses: [email protected] (S. Cho), [email protected]. edu (S. Ramasubramanian). Computer Networks 58 (2014) 2–12 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet

Transcript of Artigo Do Periodico

Page 1: Artigo Do Periodico

Computer Networks 58 (2014) 2–12

Contents lists available at ScienceDirect

Computer Networks

journal homepage: www.elsevier .com/locate /comnet

Localizing link failures in all-optical networks using monitoringtours

1389-1286/$ - see front matter � 2013 Elsevier B.V. All rights reserved.http://dx.doi.org/10.1016/j.comnet.2013.07.018

⇑ Corresponding author. Tel.: +1 5206214521.E-mail addresses: [email protected] (S. Cho), [email protected].

edu (S. Ramasubramanian).

Sangman Cho, Srinivasan Ramasubramanian ⇑Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721, United States

a r t i c l e i n f o

Article history:Received 5 March 2013Accepted 29 July 2013Available online 2 September 2013

Keywords:Failure localizationLink failuresMonitoring cyclesMonitoring pathsMonitoring tours

a b s t r a c t

In this paper, we introduce the concept of monitoring tours (m-tours) to uniquely localizeall possible failures up to k links in all-optical networks. We establish paths and cycles thatcan traverse the same link at most twice (forward and backward) and call them m-tours.An m-tour is different from other existing schemes such as m-cycle and m-trail, which tra-verse a link at most once. Closed (open) m-tours start and terminate at the same (distinct)monitor location(s). Each tour is constructed such that any shared risk linked group (SRLG)failure results in the failure of a unique combination of closed and open m-tours. We provethat k-edge connectivity is a sufficient condition to localize all SRLG failures with up tok-link failures when only one monitoring station is employed. We introduce an integer lin-ear program (ILP) and a greedy scheme to find the monitoring locations to uniquely localizeany SRLG failures with up to k links. We provide a heuristic scheme to compute m-tours fora given network. We demonstrate the validity of the proposed monitoring method throughsimulations. We show that our approach using m-tours significantly reduces the number ofrequired monitoring locations compared to previously developed techniques.

� 2013 Elsevier B.V. All rights reserved.

1. Introduction

The high data rates offered by optical transmissiontechnology has increased the number of multimedia andinteractive applications over the Internet in the past fewyears. However, the increased data rate does also increasethe amount of data lost due to temporary service disrup-tion caused by fiber cuts or component failures. Therefore,fault detection and localization becomes one of the mostimportant issues in a network. Although single link failuresare more common, multiple link failures occur due toshared risks. Such risks include the routing of fibersthrough the same duct, failure of a link while another linkis under maintenance, or natural disasters that cause linkstraversing a region to fail.

Fault detection and localization may be performed at thephysical layer by employing optical power detection andoptical spectrum analysis [1,2]. To detect faults, monitorsplaced at several network locations generate an alarmwhenever a fault occurs. By observing the generatedalarms, the precise location of the fault may be identified.Several researchers have developed methods for localizingfailures by observing monitor alarms generated by moni-tors [3–7]. Conventional link based monitoring schemesneed one monitor at each communication link. In [8], anadaptive technique for fault diagnosis using ‘‘probes’’ waspresented. According to this scheme, probes are establishedsequentially, each time using information about alreadyestablished probes. While sequential probing helps achieveadaptiveness, it also increases the fault localization time. In[9], a non-adaptive fault diagnosis approach was developedbased on establishing a set of probes. The techniques pre-sented in [8,9] assume that any node can generate and ter-minate (analyze) a probe. Thus, any node could be amonitor. As monitoring of signals in the optical domain is

Page 2: Artigo Do Periodico

(a) (b) (c)Fig. 1. Types of monitoring probes. Nodes 1 and 6 are monitoring stations. (a) Monitoring cycle (1–2–5–3–4–5–1); (b) monitoring path (1–2–5–3–4–5–6);(c) monitoring tour (1–2–5–2–1).

S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12 3

expensive, it is imperative that the number of nodesemploying monitors is minimized. In order to minimizethe number of monitors used for fault detection, [1,3] pro-vided an algorithm for finding optimal monitor placement.

To detect and localize failures in a more efficient man-ner, m-cycles [4,5] and m-trails [6,7,10,11] were proposed.In the m-cycle scheme, a cycle that starts and ends at thesame node is employed to monitor the network condition.The m-trail removes the cycle constraint and it is possibleto start and terminate at different monitor nodes. For them-trails and m-cycles, it is assumed that one monitor percycle/trail is required and a cycle/trail can pass through anode several times but a link at most once. Since the min-imum number of trails/cycles required is dlog (F + 1)e,where F is the number of failures to localize, we need atleast dlog (F + 1)e monitors. It is worth noting that severalof the monitors could be placed at the same node. As themonitors are expensive, it is practical to assume that anode may employ only one monitor and that monitorcould be timeshared across different trails/paths.

In [12,13], the authors used monitoring paths andcycles1 to localize single link and Shared Risk Link Group(SRLG) failures. They proved that (k + 2)-edge connectivityis necessary and sufficient to uniquely localize all SRLG fail-ures involving up to k links with one monitor. If the net-work is not (k + 2)-edge-connected, then the minimumnumber of monitors and their placement are identified. Inaddition, the authors develop a generic method for comput-ing monitoring cycles and paths by merging all the monitorsin the network and computing monitoring cycles. When themonitoring nodes are expanded, monitoring cycles/pathsare obtained to uniquely localize all failures. The problemof localizing SRLG failures using m-trails was also studiedin [14,15].

One of the major drawbacks of the monitoring cycles/paths/trails based approaches is that they all assume a linkmay be traversed in only one direction by a cycle/path/trail. In practice, links are directional due to the use ofin-line amplifiers. Thus, bi-directional links are realizedin practice using two unidirectional links running in oppo-site directions. Thus, we may allow a probe to traverse alink in both directions (similar to that employed in [8,9]).

1 The monitoring paths and cycles are the same as m-trails and m-cycles.The monitoring paths start and end at distinct monitoring nodes, whilemonitoring cycles start and end at the same monitoring node. Both cyclesand paths may be non-simple, where a node may be traversed multipletimes.

We refer to such probes as monitoring tours (m-tours).Monitoring cycles/paths/trails are then simply a specialcase of monitoring tours. Since the number of monitorsthat need to be employed in the network is significantly re-duced when employing tours, the average length of a tourmay increase. We assume that sufficient optical regenera-tors are deployed along the fiber to amplify the degradedsignals and the degradation in the signal quality is measur-able after these regenerations.

Fig. 1 shows examples of a monitoring cycle, path, andtour. Nodes 1 and 6 are assumed to be the monitoring sta-tions. The monitoring cycle starts and ends at the samemonitoring station. The monitoring path starts and endsat distinct monitoring stations. The monitoring cycle andpath shown here are ‘‘non-simple’’ cycle and path, respec-tively, as node 5 appears twice. By allowing a probe to tra-verse a link at most twice, once in each direction, theapproach using m-tours can enrich the probe paths and in-crease the flexibility for the localization problem. Our goalin this paper is to study the properties of localizing linkfailures using monitoring tours and evaluate the trade-offswhen compared to employing only monitoring cycles andpaths.

1.1. Contributions

In this paper, we show that to uniquely localize SRLGfailures involving up to k link failures with one monitorand m-tours, it is necessary and sufficient that the linkgraph of the given network is k-vertex-connected and thedegree of the monitor node is at least k. On the other hand,in [13], the authors showed that (k + 2)-edge connectivityis a necessary and sufficient condition to localize up to klink failures with a single monitor and cycles. Therefore,by using tours, we can mitigate the connectivity conditionand greatly reduce the number of monitors required tolocalize up to k links failures. According to [14,15], themonitoring cost is more important than the bandwidthcost (length of a monitoring cycle/path/tour) in the totalcost. By reducing the number of monitors required, wecan significantly save the monitoring cost and simplify net-work management, which is our objective in this study.

Next, we identify the necessary and sufficient conditionson the placement of monitors if the given network does notsatisfy the connectivity requirement to localize with onemonitor. We develop an integer linear program and a greedyheuristic to compute the minimum number of monitors re-quired. We show that in certain situations, the computation

Page 3: Artigo Do Periodico

4 S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12

of the minimum number of monitors is a solution to the ver-tex-cover problem, thus is NP-Hard. We develop an algo-rithm to compute monitoring tours for a given placementof monitors. We compare the performance of monitoringwith tours to that of monitoring with cycles/paths in termsof the number of monitors required, number of cycles/paths/tours employed, and the total network resources con-sumed. We study the performance in both real-life and ran-dom network topologies. Through extensive simulations,we show that localizing using monitoring tours can signifi-cantly reduce the number of monitors required.

Fig. 2. Example illustrating localizing five single-link and two SRLGfailures using monitoring tours: T1: 1–2–1, T2: 1–3–1, T3: 1–4–1, T4: 1–2–3–2–1, and T5: 1–4–3–4–1.

1.2. Organization

The rest of the paper is organized as follows: Section 2discusses the network model, preliminaries, and the con-cept of failure localization using monitoring tours. Section 3proves the necessary and sufficient condition on the net-work topology for localizing SRLG failures with up to k linkfailures using one monitor. Section 4 derives the necessaryand sufficient conditions on the placement of monitors.Section 5 describes the construction of monitoring tours.Section 6 describes the simulation setup and the perfor-mance results. Section 7 concludes the paper.

2. Network model

We consider a network GðN ;LÞ, where N and L denotethe set of nodes and links, respectively. We assume that alllinks are bidirectional, composed of two unidirectionallinks. We refer to undirected links as simply links. We referto directed links as edges. We denote an undirected linkconnecting nodes x and y as x–y and the directed link(edge) from x to y as x ? y. Let W denote the set of SharedRisk Link Group (SRLG) failures. A failure w 2W is a set ofdirected links that may fail at the same time. When link x–y fails, we assume that both directed edges x ? y and y ? xfail. Therefore, for any w 2W, (x ? y 2 w)) (y ? x 2 w).

Some nodes, referred to as monitors, in the network mayhave the ability to send, receive, and analyze test signals. Amonitoring tour, or simply an m-tour, is a directed paththat starts and ends at a monitor. If the tour starts and endsat the same monitor, then it is referred to as a closed m-tour, otherwise it is referred to as a open m-tour.

The monitoring tour may travel a link in both direc-tions. Let T denote the set of m-tours established in thenetwork. A tour t 2 T is said to be affected by a failure wif the tour traverses any of the edges in w. A failure w willresult in some set of m-tours to fail. The set of tours af-fected by w is referred to as the syndrome for w. Thus,our goal is to construct m-tours such that every failure willhave a unique syndrome. In other words, every failure maybe uniquely localized by observing the set of m-tours thathave failed. When employing multiple monitors, everymonitor is assumed to broadcast the detected alarms toother monitors or send them to a central location for local-ization. Our construction will ensure that every edge ispart of at least one m-tour. Thus, if none of the m-tourshave failed, then none of the links have failed.

In [13], the necessary and sufficient conditions on theproperties of monitoring cycles/paths for uniquely localiz-ing SRLG failures are provided. Similarly, in order to un-iquely localize failure w, (1) it is necessary that at leastone m-tour traverses at least one edge in w; and (2) forevery other failure w0 there exists an m-tour that is affectedby w and not w0, or vice versa.

2.1. Example

Consider the network shown in Fig. 2(a), where node 1is assumed to be the monitor. We construct five tours. T1:1–2–1, T2: 1–3–1, T3: 1–4–1, T4: 1–2–3–2–1, and T5: 1–4–3–4–1. We consider five single link failures and two SRLGfailures. Fig. 2(b) shows the tours affected by the five sin-gle-link and two SRLG failures. Observe that every single-link and SRLG failure results in a unique combination oftours to fail.

If we restrict the construction to only monitoring cyclesand/or paths, then such an approach would have requiredat least three monitors to localize the five single-link andtwo SRLG failures. Monitors at nodes 2 and 4 are necessaryas they have degree two and a monitor at node 3 is alsonecessary as the removal of w1 or w2 would not allow amonitoring cycle/path to be established between nodes 2and 4.

Thus, the use of monitoring tours helps in significantlyreducing the number of monitors required in the networkas a tour can traverse a link in both directions.

3. Localizing with one monitor

We first derive the necessary and sufficient conditionon the network topology to uniquely localize all SRLG fail-ures involving up to k arbitrary link failures.

Note that when only monitoring cycles/paths areemployed, it is necessary and sufficient for a network tobe (k + 2)-edge-connected to uniquely localize all SRLG

Page 4: Artigo Do Periodico

S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12 5

failures involving up to k links. However, tours may tra-verse a link in both directions, thus relaxing these neces-sary and sufficient requirements.

Theorem 1. In order to uniquely localize any SRLG failureinvolving up to k arbitrary links with only one monitor andm-tours, it is sufficient that the network is k-edge-connected.

Fig. 3. Link graph corresponding to the network shown in Fig. 2(a). Thenode numbers in the link graph are obtained by combining numbers ofnodes connected by the link in the original network.

Proof. Consider a k-edge-connected network. Since the

network is connected, we can construct an m-tour travers-ing any link. Assume two failures w1 and w2 that have thesame syndrome. Our goal is to construct an m-tour that isaffected by w1 and not by w2 or vice versa.

First, let us consider the case that w1 � w2. Since w1 is aproper subset of w2, (1) the maximum number of links inw1 is k � 1; and (2) w2 has at least one link in it that is notin w1. As the network is assumed to be k edge connected,removal of all the links in w1 will not disconnect the graph.Moreover, the graph obtained by removing all the links inw1 contains at least one link in w2. Thus, we compute atour traversing the monitor and the link unique to w2 inthe residual graph. This tour will be affected by w2 and notby w1.

Now, let us consider the case that neither w1 is a subsetof w2 nor w2 is a subset of w1. Therefore, the maximumnumber of links that can be common to both w1 and w2 isk � 1. Consider the graph obtained by removing the linksthat are common to w1 and w2. The residual graph isconnected, as the original network is k edge connected andwe have removed at most k � 1 links. From this residualgraph, consider the removal of links that are unique to w1.The resultant graph has at most two components.2

We denote the component that contains the monitor byCm and the other component (if present) as C0m. We havetwo cases: (1) Some link in w2nw1 is present in Cm; or (2)No link that is unique to w2 is present in Cm. For the firstcase, we may simply compute a tour traversing themonitor and the link unique to w2. Such a tour will beaffected by w2 and not by w1. If no link unique to w2 iscontained in Cm, we simply consider the component Cm

along with the links unique to w1. This augmentedcomponent still remains connected, contains links in w1

that are not in w2, and does not contain any link in w2. Inthis augmented component, we compute a tour thattraverses the monitor and a link unique to w1. Such a touris affected by w1 and not by w2. h

Although k-edge-connectivity is a sufficient conditionfor uniquely localizing any SLRG failure involving up to klinks, it is not necessary. Consider the example networkshown in Fig. 2(a). Even though the network is only two-edge-connected, all SRLG failures involving up to threelinks may be uniquely localized with only one monitorand m-tours.

Although the necessary condition on the network topol-ogy may not be elegantly stated in terms of the given net-

2 Since the network is k-edge-connected, removal of any k � 1 links willresult in a connected graph. Thus, the removal of one additional link mayresult in at most two components.

work, however the same may be done on a transformationof the original graph, referred to as link graph.

3.1. Link Graph

Given a network GðN ;LÞ we construct a link graphG0ðN 0

;L0Þ, where every node in N0 is a bidirectional link

in L. A link between two nodes x; y 2 N0 exists in L0 if

the two links x and y in G are incident on the same node.An SRLG failure in the given network will translate into ashared risk node group (SRNG) failure in the link graph.

Fig. 3 shows the link graph corresponding to the net-work shown in Fig. 2(a).

Lemma 1. In order to uniquely localize all SRLG failuresinvolving up to k links in a network using one monitor, it isnecessary that the corresponding link graph of the network isk-vertex-connected and the monitoring node has at leastdegree k.

Proof. We prove the lemma by contradiction for each part.Assume that the link graph of a given network is not k-ver-tex-connected. Thus, there exists a set of k � 1 or fewernodes in G0 whose removal will disconnect G0. Considerthe set of components, say C0, obtained by removing onesuch set of k � 1 or fewer nodes in G0, say failure w.Observe that the monitor in the G will have connectionsto links (nodes in G0) that are present in only one of thecomponents in C0. Consider a link ‘ that belongs to a com-ponent that is not connected to the monitor. The failurew0 = w [ {‘} cannot be distinguished from w.

Assume that the monitoring node has degree less thank. Thus, the removal of all the links connected to themonitoring station will disconnect the monitor from therest of the network. Thus, any SRLG failure that hasthe links connected to the monitor as a subset cannot beuniquely localized. h

Note that k-vertex connectivity of the link-graph doesnot imply k-edge-connectivity on the original graph.However, the converse is true. If the given network isk-edge-connected, then the corresponding link graph isk-vertex-connected. We now show a stronger sufficiencycondition to localize all SRLG failures involving up to k links.

Lemma 2. In order to uniquely localize all SRLG failuresinvolving up to k links in a network using one monitor, it issufficient that the corresponding link graph of the network isk-vertex-connected and the monitoring node has degree k.

Page 5: Artigo Do Periodico

6 S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12

Proof. We prove the lemma by constructing monitoringtours to distinguish any two SRLG failures, given the net-work characteristics. Consider the link graph of the givennetwork. Since the link graph is k-vertex connected, theremoval of up to any k � 1 nodes will keep the link graphconnected. Now, consider two failure scenarios w1 andw2, and without loss of generality assume that jw1jP jw2j.If jw2j 6 k � 1, then the removal of w2 will still keep thelink-graph connected. Since the monitor is connected toat least k links, there is at least one node in the link graphthat is still connected to the monitor. Thus, we may con-struct an m-tour that traverses a link in w1nw2. The touris affected only by w1 and not by w2.

Consider the scenario that jw1j = jw2j = k. Since w1 – w2,both the sets have at least one unique link to each otherthat is not present in the other. Let ‘1 be a link that ispresent in w1 and not in w2. Similarly, let ‘2 be a link that ispresent in w2 but not in w1. Now, remove all links in w2,except ‘2. Since we have removed at most k � 1 links, theresultant link-graph must be connected. Since the monitornode is attached to at least k links, there is at least onenode in the link graph that is still connected to themonitor. If we can construct an m-tour to ‘2 withouttraversing any of the links in w1nw2, then have a tour thatis affected by w2 and not w1. If however, such a tourtraverses a link in w1, we simply trim the path to excludew2. The resultant tour will be affected by w1 and notw2. h

By combining the above two lemmas, we have the fol-lowing corollary.

Corollary 1. In order to uniquely localize all SRLG failuresinvolving up to k links in a network using one monitor, it isnecessary and sufficient that the corresponding link graph ofthe network is k-vertex-connected and the monitoring nodehas degree k.

Thus far, we viewed the localization of SRLG failures asdistinguishing between any two SRLG failures. However,the localization process may be viewed differently in orderto gain more insights into the topological structure that en-ables unique localization.

Assume that we are interested in detecting whetherlink ‘, connecting nodes x and y in the network, has failedor not. In particular, we are interested in detecting whetherlink ‘ has failed or not in the presence of other link failures.What is the maximum number of links (other than ‘) thatcould fail such that we may still uniquely determine if link‘ has failed or not? Based on Lemma 1, if link ‘ is connectedto the monitor (through one of its links in the link graph)after the removal of all failed links, then ‘ may be uniquelydetected. In other words, ‘ is reachable from the monitoreven when k � 1 other links are removed. Note that ‘

may be reached from the monitor through node x or y.Thus, if we merge nodes x and y, then this merged nodehas k link-disjoint paths to the monitor. In addition, ifeither x or y is a monitor, then the merged node itself be-comes a monitor.

4. Monitor placement

In order to uniquely localize up to k link failures withonly one monitor, we have proved that k-edge connectivityis a sufficient condition for a given network and k-vertexconnectivity is a necessary and sufficient condition onthe corresponding link graph of the network. However, inreality, many networks may not have the required connec-tivity. Therefore, we may need more monitor locations tolocalize up to k link failures. The results in the previoussection provides the necessary and sufficient conditionson the placement of monitors.

Theorem 2. In order to uniquely localize SRLG failuresinvolving up to k links, it is necessary and sufficient thatevery connected component, obtained by removing any(k � 1) links, that has at least one link must have a monitor.

Proof. The necessary part of the theorem is proved usingcontradiction. The contradiction may be obtained directlyfrom Lemma 1. Consider a connected component obtainedby removing k � 1 links, say SRLG w. If there are no links inthis component, i.e. the component is just composed ofonly one node, then the node need not be a monitor asthere are no links incident on this node. However, if a con-nected component contains a link, say ‘, and a monitor isnot present in the component, then any path from one ofthe two nodes connected by ‘ to the monitor must traverseone or more of the k � 1 links. Thus, we cannot localize thefailure of SRLG w [ {‘} and w.

The sufficiency part of the theorem is proved byconstructing a tour to distinguish any two arbitrary SRLGsw1 and w2. The proof follows along the lines of Lemma 2.Consider the scenario that jw1jP jw2j. Assume that jw2-

j 6 k � 1. The removal of w2 will result in some connectedcomponents. Since a monitor is present in every compo-nent that contains a link that is unique to w1 and not w2,we may construct an m-tour that is affected by w1 and notw2.

Now, consider the scenario jw1j = jw2j = k. Let ‘1 be thelink that is unique to w1 and not in w2. Similarly, let ‘2 bethe link that is unique to w2 and not in w1. Remove all linksin w2 except ‘2. If in the resultant components, ‘1 and ‘2 arenot in the same connected components, we can construct atour that is affected by w1 and not w2 and vice versa.However, assume that ‘1 and ‘2 are in the same compo-nent. Since this component has a monitor, we find a pathfrom the monitor to ‘2. If such a path does not traverse anylink in w1nw2, we have a tour that is affected by w2 and notw1. If however, such a tour traverses a link in w1, we simplytrim the path to exclude w2. The resultant tour will beaffected by w1 and not w2. h

Although the above theorem provides the necessaryand sufficient conditions on the placement of monitors,the conditions are based not on a component’s connectionsto other components, but they are based on whether a

Page 6: Artigo Do Periodico

Fig. 4. ILP procedure for obtaining the minimum number of monitorstations.

Fig. 6. Procedure to construct the set of monitoring tours to localize allSRLG failures involving up to k links.

S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12 7

component has a link inside or not. This property does notlend itself to any easier form of computing the minimumnumber of monitors required.

In certain situations, the computation of the minimumnumber of monitors is an NP-Hard problem. For example,consider a network where the degree of every node isupper bounded by d. Assume that we need to computethe minimum number of monitors required to uniquelylocalize any SRLG failures involving up to 2d � 1 failures.Based on the necessary conditions on the placement ofmonitors, the components obtained by removing any2d � 2 links must all have a monitor if the component con-tains a link. Therefore, if we consider a link ‘ that connectsnodes x and y, then the number of links incident on x and yexcluding link ‘ is at most 2d � 2. Thus, the removal of alllinks other than ‘ at x and y will result in a component withonly x and y with link ‘ connecting them. Thus either x or yneeds to be a monitor. The same argument may be appliedto any link in the network. Computing the minimum num-ber of monitors (and their placement) is also a solution tothe vertex-cover problem. If we can solve the monitorplacement problem in polynomial time, then vertex-coverproblem could be solved in polynomial time. Thus, com-puting the minimum number of monitors required tolocalize SRLG failures is an NP-Hard problem when thenumber of simultaneous failures exceeds a certainthreshold.

We develop an integer linear program formulation anda greedy algorithm to compute the monitors required touniquely localize all SRLG failures involving up to k links.Let xn be a binary variable that indicates if node n is a mon-itor or not. Let C1 through CR denote the set of connectedcomponents, that contain at least one link, obtained byremoving k � 1 links. Fig. 4 shows the integer linear pro-gram (ILP) formulation for computing the minimum num-ber of monitors and their placement. Solving the ILP resultsin the minimum number of monitoring stations, however

Fig. 5. Pseudocode for the greedy heuristic to obtain monitor placement.

the worst-case computational complexity is exponentialin the number of nodes.

4.1. Heuristic approach

We develop a greedy heuristic that is simple and intui-tive. Fig. 5 shows the two steps involved in the greedy heu-ristic. The first step simply runs through all the connectedcomponents that have at least one link, computed byremoving k � 1 links, and assigns a monitor to them if theyalready do not have one. Note that although this stepchecks if a connected component has a monitor or not, itis possible that we may able to reduce the number of mon-itors placed in this step. For example, assume that C1 andC2 are two components and that C2 � C1. If the procedurevisits component C1 first, it could select a node that is pres-ent in C1 but not in C2. When the step considers componentC2, it will assign another monitor to a node in C2. However,given that a node in C2 has a monitor, we could have re-moved the monitor placed when considering C1 earlier.The second step in the procedure attempts to eliminatethe monitors when possible. The second step checks if amonitor can be removed without violating the constraintsin order to get a minimal set of monitors.

5. Computing monitoring tours for a given monitorplacement

Given a network and the locations of the monitors thatare sufficient to localize all SRLG failures involving up to klinks, we compute the set of monitoring tours.3 The proce-dure to construct monitoring tours to uniquely localize allSRLG failures with up to k links is shown in Fig. 6. For a givenset of monitoring locations, merge them into a super moni-tor node J. We call this process graph transform.

3 Recall that tours include cycles and paths as well.

Page 7: Artigo Do Periodico

(a) (b)Fig. 7. An example of graph transform to localize up to three link failures.The node with the red mark is the monitor station. (a) Original graph withtwo monitoring stations; (b) the transformed graph of (a).

Fig. 9. Comparison of the number of monitor stations obtained from ILPand greedy algorithms on four network topologies for localizing up to k

8 S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12

Fig. 7 shows an example of the graph transform. In thetransformed graph, we have only one monitoring station.In order to localize all SRLG failures, every SRLG shouldhave a distinct syndrome and each SRLG failure affects atleast one monitoring tour. If two SRLG failures have thesame syndrome, then the two failures cannot be uniquelylocalized. In oder to localize all SRLGs we randomly choosetwo SRLGs, w1 and w2 that have the same syndrome andadd a tour (t) that traverses the super node J and link‘ 2w1nw2 to the feasible solution set, T .

In the transformed graph, if a link ‘ 2 w1nw2 is a loop atnode J, then the loop is a monitoring path connecting a pairof monitoring stations in the original graph. Otherwise, wealways construct a tour that traverses the monitor station Jand a link ‘ 2w1nw2. We consider the use of cycles when-ever available as it traverses a higher number of distinctlinks in contrast to some tours that simply go forwardand backward on a path. However, if there is no two-edgeconnected component after removing w2 in the trans-formed graph, then we cannot construct cycles and onlytours will be used. If a path from node J to link ‘ 2 w1nw2

does not exist after the removal of w2, then remove w1

and construct a cycle/tour passing through link ‘ 2w2nw1.This process is repeated until every SRLG has a distinctsyndrome. Step 5 in the procedure to construct monitoringtours simply guarantees that every SRLG would affect atleast one tour/cycle. A monitoring tour that includes amonitoring cycle in the transformed graph will be a mon-itoring tour containing a monitoring cycle or path in theoriginal graph.

Fig. 8. Four real-life network topol

5.1. Complexity

The time complexity of constructing a tour passingthrough a monitor node J and link ‘ is OðjN j log jN j þ jLjÞ.In the worst case, we require at most jWj tours to localizejWj failures [13]. Therefore the worst case complexity ofour algorithm is OðjWjjN j log jN j þ jWjjLjÞ.

6. Performance evaluation

In this section, we compare the number of requiredmonitoring stations obtained with the greedy approachwith that obtained using the ILP on four networks:NJ-LATA, NSFNET, ARPANET, and Node-16, as shown inFig. 8. NJLATA and NSFNET are two-edge connected net-works and the other two are three-edge connected net-works. Fig. 9 provides the simulation results comparingour greedy approach to the ILP solution introduced in Sec-tion 4. For obtaining the ILP solution, ILOG CPLEX 8.1 isused [16]. From the table, we can see that the number ofrequired monitoring stations increases with increasing kvalues since the removal of any k � 1 links generates manycomponents that need a monitoring location. The tablealso shows that our greedy algorithm has the exact samevalues with the optimal solution for scenarios with up tofive link failures except for node 16 in the topology withk = 5. Notice that NJLATA and NSFNET networks need onlyone monitoring location to localize up to three link failuresalthough they are two-edge connected networks.

Fig. 10 shows the running time of the greedy approachon the four real-life networks. From the figure, we observethat the computation time increases exponentially as the

ogies used in the simulation.

link failures.

Page 8: Artigo Do Periodico

Fig. 10. The running time of the greedy approach on the four real-lifenetworks. The time unit is second.

S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12 9

value of k increases for each topology and it is proportionalto the number of nodes for the same value of k.

Next, we use our heuristic algorithm to find a feasiblesolution set, that is, m-tours. We evaluate the performanceof m-tours against the algorithm using monitoring cyclesand paths [13]. We consider four performance metrics:(1) the number of required monitoring locations, (2) aver-age tour and cycle (path) length that reflects the amount ofresources consumed per m-tour and monitoring cycle(path), (3) the total number of tours and cycles (paths)for a given arrangement of monitors, and (4) the averagenumber of tours/cycles traversing a link. The fourth metricis defined as:P

t2T ðc2CÞltðcÞ

j L j

where T and C are the set of all tours and cycles which tra-verse a monitor station and lt(c) is the number of links in atour t (cycle c).

Note that when tours are employed, the number ofmonitoring stations required is reduced significantly.

0

2

4

NJ-LATA NSFNET Node-16 ARPANET

Num

ber o

f mon

itor s

tatio

ns

Tour Cycle (Path)

(a)

0

8

16

NJ-LATA NSFNET Node-16 ARPANET

Num

ber o

f tou

rs a

nd c

ycle

s (p

aths

)

Tour Cycle (Path)

(c)Fig. 11. Performance for single link failure localization. (a) Number of monitoringcycles, (d) average length of a tour/cycle.

Minimizing the number of monitoring stations can sim-plify network management and reduce the cost of set-upand operation. Thus, we mainly show the amount by whichour approach decreases the number of required monitorsto localize both single and multiple links failures comparedto the approach employing cycles and paths. For this pur-pose, we use real life networks and generate three arbi-trary networks that are connected with 20, 60, and 100nodes. In the given networks, we find the placement ofmultiple monitoring locations. Given the set of monitoringlocations, we merge these monitoring locations and applythe graph transformation to find a feasible solution setcomposing of m-tours. Every simulation histogram inFigs. 12, 13, and 15 reflect the average value of 10 runswith different topologies but these have same number ofnodes and average node degree.

6.1. Results and discussion

Fig.11 shows the four performance metrics on the fournetworks shown in Fig. 8 under single link failure scenario.From Fig. 11(a), the scheme using monitoring cycles andpaths requires three and two monitor locations on NJ-LATAand NSFNET networks, respectively, since they are two-edge connected. However our scheme requires only onemonitoring station on the four topologies since the givennetwork is connected, which satisfies Theorem 1.

Fig. 11(b) shows the average number of tours and cycles(paths) per link. From the figure we observe that the ap-proach using monitoring tours requires more monitoringprobes per link than that using monitoring cycles (paths)for the two-edge connected networks. This is because theapproach using monitoring cycles and paths places

0

2

4

NJ-LATA NSFNET Node-16 ARPANET

Avg.

# o

f tou

rs a

nd c

ycle

s (p

aths

) per

link

Tour Cycle (Path)

(b)

0

6

12

NJ-LATA NSFNET Node-16 ARPANET

Avg.

leng

th o

f tou

rs a

nd c

ycle

s (p

aths

)

Tour Cycle (Path)

(d)stations, (b) average number of tours/cycles per link, (c) number of tours/

Page 9: Artigo Do Periodico

(a) (b)

(c) (d)Fig. 12. Single link failure localization results using two methods on arbitrary topologies with 20, 60, and 100 nodes.

(a) (b) (c)Fig. 13. Average number of tours and cycles (paths) per link in three random topologies. (a) 20 nodes; (b) 60 nodes; (c) 100 nodes.

Fig. 14. The number of monitors for the three simulation scenarios onrandom networks with n = 20 and the average node degree three.

10 S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12

multiple monitor stations on the networks. Thereforemany links are localized by nearby monitoring stations.

Fig. 11(c) and (d) show that our scheme performs aswell as the other scheme even though it requires only asingle monitor location. Notice that our scheme and theother scheme perform equally on Node-16 and ARPANETtopologies. The reason is because the approach usingm-tours can always find cycles on the two topologies. Inconclusion, our approach has almost the same

performance compared to the other scheme while reduc-ing the number of monitor stations in the networks.

For each of the four performance metrics, we plot theperformance ratio of the scheme using monitoring cyclesand paths to that using m-tours for single link failure sce-narios in random topologies. We make several observa-tions from Fig. 12, as follows.

The scheme using m-tours uses one monitor location tolocalize single link failures while the scheme using onlymonitoring paths and cycles needs many monitoring loca-tions. For sparse networks (average node degree two)about 70% of the nodes must have monitors in order tolocalize single link failures with the algorithm using mon-itoring cycles and paths as shown in Fig. 12(a). In contrast,for sparse networks, the scheme using m-tours drasticallyreduces the number of required monitor locations, whichis a very clear improvement over using cycles and paths.For example, for networks with 100 nodes with averagenode degree of two, the scheme using monitoring cyclesand paths requires about 70 monitor locations.

Page 10: Artigo Do Periodico

0

5

10

15

20

k=1 k=2 k=3

Avg.

# o

f mon

itorin

g st

atio

ns

Tour (original)

Cycle (Path)

0

5

10

15

20

k=1 k=2 k=3 Avg.

# o

f tou

rs a

nd c

ycle

s (p

aths

) pe

r lin

k

Tour (original) Tour (more monitors) Cycle (Path)

(b)(a)

0

20

40

60

80

k=1 k=2 k=3 Avg.

# o

f tou

rs a

nd c

ycle

s (p

aths

)

Tour (original) Tour (more monitors) Cycle (Path)

0

2

4

6

8

10

k=1 k=2 k=3

Avg.

leng

th o

f tou

rs a

nd c

ycle

s (p

aths

)

Tour (original) Tour (more monitors) Cycle (Path)

(d)(c)Fig. 15. Single, two, and three link failure localization results using three methods on arbitrary topologies with 20 nodes and average node degree three.

S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12 11

For sparse networks the average total length sum ratiohas small value compared to other topologies as shownin Fig. 12(b). The reason is because for such networks thescheme using monitoring cycles and paths requires manymonitoring locations, which means many links can belocalized by nearby monitoring stations. Thus the totallength sum increases as node degree increases. Therefore,the figure shows an increasing ratio with increasing nodedegree. However, it is still less than one. This is becausethe scheme using monitoring cycles and paths uses multi-ple monitoring stations for all node degrees. Fig. 12(d) alsoshows similar results. The ratio approaches one for densenetworks (average degree 5). Notice that these two ratiosare not really bad even in sparse networks compared tothe ratio of the number of required monitoring locations.

Fig. 12(c) shows the ratio of the average number ofmonitoring cycles (paths) to m-tours. The ratio is almostone for the topologies with degree three or more. Forsparse networks, the scheme using m-tours requires moretours to localize single link failures than the scheme usingmonitoring cycles and paths. This is because one cycleoperation can localize more SRLG failures than a touroperation.

Fig. 13 shows the average number of tours/cycles(paths) required per link in random topologies under a sin-gle link failure scenario. From the figure, we observe thefollowing:

� The average number of tours per link decreases andapproaches that of cycles (paths) per link as the averagenode degree increases. The reason is because thescheme using m-tours constructs more cycles thantours as the average node degree increases.

� The average number of cycles (paths) per link increasesas the average node degree increases. The reason isbecause the number of monitors required decreases asthe average node degree increases.� From the figure we can find that the average number of

tours per link increases as the number of nodesincreases under the same average node degree. The rea-son is because tours in large networks are longer thanthose in small networks with a single monitor station.

Finally, we consider multiple link failure scenarios(k = 2 and 3) with the two schemes. In Fig. 15, we plotthe four performance metric values assuming the numberof nodes is 20 and the average node degree is three. Thewhite histograms represent the results employing moni-toring tours while the black histograms represent the re-sults employing monitoring cycles (paths). The approachemploying m-tour needs only 1, 1.2, and 1.9 monitoringstations on average to localize up to 1, 2, and 3 link failuresrespectively. On the other hand, the scheme employingmonitoring cycles (paths) requires about 8, 12, and 16monitor locations as shown in Fig. 15(a). To inspect theeffectiveness of using tours, we assign about half of themonitors used in the approach employing monitoring cy-cles (paths) to a given network and apply the approachusing monitoring tours. The gray histograms show the sim-ulation results of the modified method. The table in Fig. 14shows the number of monitors required for the three sim-ulation scenarios.

From Fig. 15(b) through (d), we observe that the perfor-mance using tours approaches that using cycles (paths) ifwe add only about half of the monitors used in the ap-proach using monitoring cycles (paths). Therefore we

Page 11: Artigo Do Periodico

12 S. Cho, S. Ramasubramanian / Computer Networks 58 (2014) 2–12

conclude that the approach employing monitoring toursincreases the flexibility of monitoring probes and it is bet-ter than the technique using monitoring cycles (paths) interms of the total cost defined in [14,15].

7. Conclusion

In this paper, we considered the problem of localizingSRLG failures involving up to k links using monitoringtours. We showed that the ability to traverse a link in bothdirections greatly reduces the requirement on the neces-sary and sufficient conditions on the network topologyfor unique localization. We developed necessary and suffi-cient conditions on the placement of monitors and showedthat in some scenarios, the computation of minimum mon-itors is an NP-Hard problem. We developed an ILP formu-lation and a greedy heuristic to compute the number ofmonitors required in an arbitrary network. We also devel-oped an algorithm to compute monitoring tours for a givenplacement of monitors. We showed through experimentalevaluations that the use of tours greatly reduces the num-ber of monitors required to uniquely localize SRLG failurescompared to using cycles/paths. The reduction in the num-ber of monitors required is accompanied by an increase inthe number of tours and the average/total length of toursas compared to the number of cycles/paths and average/to-tal length of cycles/paths. However, a small increase in thenumber of monitors can drastically decrease the number oftours. From these observations, we conclude that the ap-proach using monitoring tours can enrich the flexibilityof monitoring probes and decrease the total cost comparedto the scheme that uses only monitoring cycles and pathsfor network management.

Acknowledgement

The research developed in this paper is funded in partby the National Science Foundation under grant CNS-1117274.

References

[1] S. Stanic, S. Subramaniam, H. Choi, G. Sahin, H.-A. Choi, Onmonitoring transparent optical networks, in: Proceedings of the2002 International Conference on Parallel Processing Workshops,2002.

[2] C.-S. Li, R. Ramaswami, Automatic fault detection, isolation, andrecovery in transparentall-optical networks, Journal of LightwaveTechnology 15 (10) (1997) 784–1793.

[3] A. Pal, A. Paul, A. Mukherjee, M. Naskar, M. Nasipuri, in: FaultDetection and Localization Scheme for Multiple Failures in OpticalNetwork, Lecture Notes in Computer Science, Springer, Berlin/Heidelberg, 2008.

[4] B. Wu, K.L. Yeung, M2-cycle: an optical layer algorithm for fast linkfailure detection in all-optical mesh networks, in: Proceedings ofIEEE GLOBECOM, 2006, pp. 1–5.

[5] B. Wu, K.L. Yeung, Monitoring cycle design for fast link failuredetection in all-optical networks, in: Proceedings of IEEEGLOBECOM, 2007, pp. 2315–2319.

[6] J. Tapolcai, B. Wu, P.-H. Ho, On monitoring and failure localization inmesh all-optical networks, in: Proceedings of IEEE INFOCOM, April.2009, pp. 1008–1016.

[7] B. Wu, P.-H. Ho, K.L. Yeung, Monitoring trail: on fast link failurelocalization in all-optical WDM mesh networks, IEEE/OSA Journal ofLightwave Technology 27 (18) (2009).

[8] Y. Wen, V.W.S. Chan, L. Zheng, Efficient fault diagnosis algorithms forall-optical WDM networks with probabilistic link failures, IEEE/OSAJournal of Lightwave Technology 23 (10) (2005) 3358–3371.

[9] N. Harvey, M. Patrascu, Y. Wen, S. Yekhanin, V.W.S. Chan, Non-adaptive fault diagnosis for all-optical networks via combinatorialgroup testing on graphs, in: IEEE INFOCOM, May 2007, pp. 697–705.

[10] B. Wu, P.-H. Ho, J. Tapolcai, X. Jiang, A novel framework of fast andunambiguous link failure localization via monitoring trails, in:Proceedings of IEEE INFOCOM, March 2010, pp. 1–5.

[11] J. Tapolcai, L. Ronyai, P.-H. Ho, Optimal solutions for single faultlocalization in two dimensional lattice networks, in: Proceedings ofIEEE INFOCOM, 14–19 March 2010, pp. 1–5.

[12] S.S. Ahuja, S. Ramasubramanian, M. Krunz, Single-link failuredetection in all-optical networks using monitoring cycles andpaths, IEEE/ACM Transactions on Networking (2009).

[13] S. Ahuja, S. Ramasubramanian, M. Krunz, SRLG failure localization inoptical networks, IEEE/ACM Transactions on Networking 19 (4)(2011) 989–999.

[14] P. Babarczi, J. Tapolcai, P.-H. Ho, Adjacent link failure localizationwith monitoring trails in all-optical mesh networks, IEEE/ACMTransactions on Networking (2011).

[15] J. Tapolcai, P.H. Ho, L. Rónyai, P. Babarczi, B. Wu, Failure localizationfor shared risk link groups in all-optical mesh networks usingmonitoring trails, IEEE/OSA Journal of Lightwave Technology (2011).

[16] IBM, CPLEX. <http://www.cplex.com>.

Sangman Cho received B.E. degree in Elec-tronics Engineering from Inha University,Incheon, South Korea, in 2000, M.E. degree incomputer and communication engineeringfrom POSTECH, Pohang, South Korea, in 2003,and Ph.D. degree in Computer Engineeringfrom University of Arizona, Tucson, in 2010.He held research positions at Samsung Elec-tronics, Suwon, South Korea; and Bellwave,Seoul, South Korea. His research interestsinclude network monitoring, fault localiza-tion, multi-path routing, and channel access

schemes in multi-channel wireless networks.

Srinivasan Ramasubramanian received theB.E. (Hons.) degree in Electrical and Electron-ics Engineering from Birla Institute of Tech-nology and Science (BITS), Pilani, India, in1997, and the Ph.D. degree in ComputerEngineering from Iowa State University,Ames, in 2002. He is currently an AssociateProfessor in the Department of Electrical andComputer Engineering at the University ofArizona, where he held the position of Assis-tant Professor from August 2002 to July 2008.He is a co-developer of the Hierarchical

Modeling and Analysis Package (HIMAP), a reliability modeling andanalysis tool, which is currently being used at Boeing, Honeywell, andseveral other companies and universities. His research interests include

architectures and algorithms for optical and wireless networks, multipathrouting, fault tolerance, monitoring and localization, network tomogra-phy, and performance analysis. He has served as the TPC Co-Chair ofBROADNETS 2005, ICCCN 2008, and ICC 2010 conferences and LANMAN2010 Workshop. He has served on the editorial board of the SpringerWireless Networks Journal from 2005 to 2009. He is currently an Asso-ciate Editor for the IEEE/ACM Transactions on Networking.