Networking issues in wireless sensor networks

16
J. Parallel Distrib. Comput. 64 (2004) 799–814 Networking issues in wireless sensor networks Deepak Ganesan, a, * Alberto Cerpa, a Wei Ye, b Yan Yu, a Jerry Zhao, b and Deborah Estrin a a Department of Computer Science, University of California, 4531 Boelter Hall, Los Angeles, CA 90095-1596, USA b USC/ISI 4676, Admiralty Way, Marina Del Rey, CA 90292, USA Received 25 June 2003; revised 10 December 2003 Abstract The emergence of sensor networks as one of the dominant technology trends in the coming decades (Technol. Rev. (February 2003)) has posed numerous unique challenges to researchers. These networks are likely to be composed of hundreds, and potentially thousands of tiny sensor nodes, functioning autonomously, and in many cases, without access to renewable energy resources. Cost constraints and the need for ubiquitous, invisible deployments will result in small sized, resource-constrained sensor nodes. While the set of challenges in sensor networks are diverse, we focus on fundamental networking challenges in this paper. The key networking challenges in sensor networks that we discuss are: (a) supporting multi-hop communication while limiting radio operation to conserve power, (b) data management, including frameworks that support attribute-based data naming, routing and in- network aggregation, (c) geographic routing challenges in networks where nodes know their locations, and (d) monitoring and maintenance of such dynamic, resource-limited systems. For each of these research areas, we provide an overview of proposed solutions to the problem and discuss in detail one or few representative solutions. Finally, we illustrate how these networking components can be integrated into a complex data storage solution for sensor networks. r 2004 Elsevier Inc. All rights reserved. 1. Introduction The availability of micro-sensors and low-power wireless communications will enable the deployment of densely distributed sensor/actuator networks for a wide range of applications [41]. Application domains are diverse and can encompass a variety of data types including acoustic, image, and various chemical and physical properties. These sensor nodes will perform significant signal processing, computation, and network self-configuration to achieve scalable, robust and long- lived networks [11]. More specifically, sensor nodes will do local processing to reduce communications, and consequently, energy costs. Akyildiz et al. [3] provide a comprehensive overview of different aspects of research in sensor networks. In this paper, we will take a more in- depth look at networking challenges, including more recent techniques in this area. Sensor networks pose interesting challenges for networking research. Foremost among these is the development of long-lived sensor networks in spite of energy-constraints of individual nodes. Sensor nodes are expected to be battery equipped, and deployed in a variety of terrains. In some of these deployments, it may be feasible to harness energy from ambient sources, such as solar power, whereas in others such as climate monitoring in the canopies, sensor nodes may not be able to renew their energy resources. A major energy consumer is radio communication [32]. A comparison of the cost of computation to commu- nication in future platforms by Pottie and Kaiser [32] reveals that 3000 instructions can be executed for the same cost as the transmission of one bit over 100 m: Energy conservation in such networks involves two dominant approaches to minimize communication over- head. The first is at the MAC and networking layers, where nodes turn off their radios when they are not required for multihop communication (adaptive duty cycling). The second is data reduction through in- network processing (also called data aggregation), whereby correlations in data are exploited [14,22] to reduce the size of data, and correspondingly commu- nication cost. The large number of nodes expected in sensor network deployments, and the unpredictable nature of ARTICLE IN PRESS *Corresponding author. E-mail address: [email protected] (D. Ganesan). 0743-7315/$ - see front matter r 2004 Elsevier Inc. All rights reserved. doi:10.1016/j.jpdc.2004.03.016

Transcript of Networking issues in wireless sensor networks

J. Parallel Distrib. Comput. 64 (2004) 799–814

ARTICLE IN PRESS

*Correspond

E-mail addr

0743-7315/$ - se

doi:10.1016/j.jp

Networking issues in wireless sensor networks

Deepak Ganesan,a,* Alberto Cerpa,a Wei Ye,b Yan Yu,a Jerry Zhao,b and Deborah Estrina

aDepartment of Computer Science, University of California, 4531 Boelter Hall, Los Angeles, CA 90095-1596, USAbUSC/ISI 4676, Admiralty Way, Marina Del Rey, CA 90292, USA

Received 25 June 2003; revised 10 December 2003

Abstract

The emergence of sensor networks as one of the dominant technology trends in the coming decades (Technol. Rev. (February

2003)) has posed numerous unique challenges to researchers. These networks are likely to be composed of hundreds, and potentially

thousands of tiny sensor nodes, functioning autonomously, and in many cases, without access to renewable energy resources. Cost

constraints and the need for ubiquitous, invisible deployments will result in small sized, resource-constrained sensor nodes.

While the set of challenges in sensor networks are diverse, we focus on fundamental networking challenges in this paper. The key

networking challenges in sensor networks that we discuss are: (a) supporting multi-hop communication while limiting radio

operation to conserve power, (b) data management, including frameworks that support attribute-based data naming, routing and in-

network aggregation, (c) geographic routing challenges in networks where nodes know their locations, and (d) monitoring and

maintenance of such dynamic, resource-limited systems. For each of these research areas, we provide an overview of proposed

solutions to the problem and discuss in detail one or few representative solutions. Finally, we illustrate how these networking

components can be integrated into a complex data storage solution for sensor networks.

r 2004 Elsevier Inc. All rights reserved.

1. Introduction

The availability of micro-sensors and low-powerwireless communications will enable the deployment ofdensely distributed sensor/actuator networks for a widerange of applications [41]. Application domains arediverse and can encompass a variety of data typesincluding acoustic, image, and various chemical andphysical properties. These sensor nodes will performsignificant signal processing, computation, and networkself-configuration to achieve scalable, robust and long-lived networks [11]. More specifically, sensor nodes willdo local processing to reduce communications, andconsequently, energy costs. Akyildiz et al. [3] provide acomprehensive overview of different aspects of researchin sensor networks. In this paper, we will take a more in-depth look at networking challenges, including morerecent techniques in this area.Sensor networks pose interesting challenges for

networking research. Foremost among these is thedevelopment of long-lived sensor networks in spite of

ing author.

ess: [email protected] (D. Ganesan).

e front matter r 2004 Elsevier Inc. All rights reserved.

dc.2004.03.016

energy-constraints of individual nodes. Sensor nodesare expected to be battery equipped, and deployedin a variety of terrains. In some of these deployments,it may be feasible to harness energy from ambientsources, such as solar power, whereas in others suchas climate monitoring in the canopies, sensor nodesmay not be able to renew their energy resources.A major energy consumer is radio communication [32].A comparison of the cost of computation to commu-nication in future platforms by Pottie and Kaiser [32]reveals that 3000 instructions can be executed for thesame cost as the transmission of one bit over 100 m:Energy conservation in such networks involves twodominant approaches to minimize communication over-head. The first is at the MAC and networking layers,where nodes turn off their radios when they are notrequired for multihop communication (adaptive duty

cycling). The second is data reduction through in-

network processing (also called data aggregation),whereby correlations in data are exploited [14,22] toreduce the size of data, and correspondingly commu-nication cost.The large number of nodes expected in sensor

network deployments, and the unpredictable nature of

ARTICLE IN PRESSD. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814800

deployment conditions introduce significant scalabilityand reliability concerns as well. Increased levels ofsystem dynamics can be expected, including permanentor intermittent node failures. These dynamics arecompounded by the vagaries of the wireless channel,introducing environmental dynamics. Designing long-lived and unattended systems under such conditionsimplies that the system itself must carry out themeasurement and adaptive configuration in an energyconstrained fashion.Techniques to tackle these challenges fall into four

broad categories:

1.1. Limiting radio operation

The dominant use of radios is for multi-hop commu-nications, usually involving nodes sending data to oneor more data sinks, or base-station. An ideal powerconservation policy would switch radios off when anode is not required either as a data source or relay insuch multi-hop routing. Powering down radios onsensor nodes, however, renders such nodes unavailablefor multi-hop communication. A naive policy that shutsdown radios on sensor nodes when they are not in usecan cause regions of the network to be inaccessible byqueries, due to routing partitions, especially in the faceof environmental dynamics. How can sensor systemsmaintain communication backbones, and enable multi-hop routing while powering down radios for powerconservation? Two techniques that have been exploredto solve this problem are

* Adaptive duty-cycling (S-MAC [44,45], ASCENT [7],SPAN [8]): Here, the set of nodes whose radios arepowered down are carefully chosen such that anetwork backbone is continually maintained, whilenon-backbone nodes can put their radios to sleep. Toload-balance, and thus prevent nodes from dying,the active subset of nodes are adaptively cycled, basedon parameters such as available energy, radio cover-age, etc.

* Wakeup on demand (STEM [35], Wake-on-Wireless[36]): This technique uses nodes with multiple radios,a low-power radio (such as a mote radio [20]) that isused exclusively to wake up the high power radio(such as 802.11) when the need arises, much like apaging channel in cellular networks. Such a techniqueis especially useful in ad hoc networks or in sensornetworks with image data, where bandwidths anddata-rates are higher, and warrants the use ofmultiple radios.

1.2. Data management

Sensor networks are intended to collect and actuateon data about the physical world, hence their use is

expected to be highly data-centric. Unlike traditionalend-to-end networking techniques, routing and datamanagement need to be performed jointly in sensornetworks in order to maximize energy savings. There-fore, a significant networking component is to provide aflexible platform to build data management frameworksthat use various application-specific data aggregationschemes.A key aspect of data management is data naming.

While many attribute-based naming mechanisms havebeen proposed over an IP framework for the internet(e.g. [2,21,31]), sensor networks pose unique challengesdue to their data and resource constraints.Foremost among these is the disparity between the

amount of data generated by sensors and the amount ofdata that a network can communicate before depletinglimited energy resources. For example, in an environ-mental monitoring application such as [16], a node hasmultiple weather sensors, each generating samples onceevery few seconds. If each mote sensor node transmittedall data, its lifetime would be limited to a few weeks, asopposed to a target lifetime of many years. Two factorscan be exploited to reduce the amount of datacommunicated: (a) not all data is necessary for users,hence only interesting event detections or user-requireddata need to be selectively communicated, and (b) in adense sensor network, significant correlation in data canbe expected, and can be exploited to reduce the size ofdata. An important aspect of data management insensor network is support for such in-network datareduction techniques.The second challenge is resource constraints on

embedded sensor nodes. The capabilities of these devicescan range from Ipaq-class devices, to highly constrainedmica motes [20]. Traditional database methods areresource-intensive, whereas data management schemesfor sensor networks need to function on resource-limitednodes.

1.3. Geographic routing in sensor networks

Routing in sensor networks differs from routing inboth ad hoc wireless networks and the Internet in twoways: The first is that it is attribute-based and oftenincludes geographical location. The second is thatenergy constraints, network dynamics and deploymentscale preclude proactive or global schemes for routing.Routing schemes that operate primarily on local

information are more appropriate, since these can bereactive to local changes, while not requiring energyexpensive global transfers of routing tables. Suchreactive approaches are potentially more energy efficientas well, since sensor networks are expected to havebursty traffic.We describe different approaches to such routing

schemes in the context of sensor networks. Early

ARTICLE IN PRESSD. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814 801

diffusion schemes [22], and TinyDB construct opportu-nistic routing trees to route data to the sink or base-station. Recent research in localization schemes have,however, made it likely that cheap and precise locationinformation can be obtained even in some networkswhere all nodes cannot be GPS enabled. This develop-ment indicates that a dominant form of routing insensor networks is likely to be geographically driven,since location attribute can reduce communication costin the following ways:

* Sensor data is likely to be geographically correlated.Data reduction or aggregation schemes would needto route geographically to exploit such correlations(e.g. dimensions [14]).

* Queries that are geographically scoped are likely inmany applications where users would prefer to querya small geographical region rather than the entirenetwork. For instance, in a tracking application, thequery is efficiently answered by querying only nodeson the trajectory of the target rather than all nodes inthe network. Similarly, weather monitoring that istargeted at understanding local characteristics of datarather than global ones can be handled efficientlyusing geographically scoped queries.

A significant challenge results from non-uniformsensor node deployments that are likely due to deploy-ment terrain, requiring that routing protocols beaugmented with the capability to route around obstacles.

1.4. System monitoring and maintenance

System monitoring is a challenging problem in mostdistributed systems, where the ratio of number ofdevices to the number of personnel to maintain themis large. A suite of tools that ease the maintenanceoverhead is a key factor in the usefulness of suchsystems. Most typical systems, from operating systemson machines that we use everyday, to servers androuters, use some kind of logging facility to keep trackof system state. In distributed systems, these logs areoften transferred over the web to places where they canbe maintained persistently. This log information can beremotely analyzed to monitor security problems, orsystem failures. Such monitoring often assumes thatmaintaining logs is cheap, and bandwidth inexpensive.While such an assumption is certainly true of today’sInternet, energy constraints on sensor networks render itimpossible to communicate extensive logs from nodes inthe network to a central location. More discriminatingmethods of carefully selecting the data to be sent, andin-network aggregation techniques to reduce the debug-ging data need to be designed.Such network monitoring and management schemes

requires a suite of tools for networks of different scales,ranging from laboratory-scale networks comprised of a

handful of nodes to operational networks comprisinghundreds of nodes.

1.5. Case study: distributed networked storage and

feature extraction

To see how these various networking primitives canbe brought together to construct a flexible, yet efficientdata-processing application, we consider the example ofa networked storage infrastructure, Dimensions [13].This system creates multi-resolution summaries of data,and moves these summaries around the network tofacilitate both spatio-temporal feature extraction andlong-term data storage. Such an infrastructure usesmany of the above-mentioned networking primitives,

* Duty-cycling of the radio while supporting multi-hopcommunication can be done by using MAC-leveltechniques such as SMAC.

* Attribute-based naming is used to express queries onspatio-temporal attributes of data.

* Geographic Routing is used to route data to specificlocations in the network for long-term storage.

* Distributed monitoring schemes can assist long-termmaintenance under varying environmental conditions.

2. Adaptive duty cycling

This section describes techniques that enables low-duty-cycle operations of sensor nodes, which are criticalfor prolonging network lifetime.

2.1. Overview

Wireless sensor networks are designed to operate forlong time. However, nodes are in idle state for most timewhen no sensing event happens. It would be a significantwaste of energy if all nodes always keep their radios on,since the radio is a major energy consumer. Measure-ments have shown that a typical radio consumes thesimilar level of energy in idle mode as in receiving mode[24,40]. It is important that nodes are able to operate inlow duty cycles.Current research on adaptive duty cycling can be

broadly divided into three categories: general schemes ofsleep and wake-up, low duty cycle combined with MACprotocols, and duty-cycle control through topologymanagement.Research on general schemes of sleep and wake-up

focuses on how to set up the communications betweentwo nodes that are normally in sleep mode. Theseschemes are not tightly coupled with other protocolssuch as MAC and topology control. Piconet [4] is suchan example. In Piconet, each node randomly goes intosleep mode, and periodically wakes up for a short period

ARTICLE IN PRESSD. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814802

time. Every time a node wakes up, it broadcasts abeacon including its own ID. If other nodes want to talkto this node, they need to wake up and listen untilreceiving the beacon.Another example is STEM [35]. STEM uses two

radios operating in different channels, one for datatransmission and the other for node wake-up. Whenthere are no data to send, nodes turn off their data radiocompletely, and put the wake-up radio in low-duty-cyclemode as in Piconet. Unlike Piconet, in STEM a sender isresponsible for waking up the receiver. It does soby sending a wake-up tone (STEM-T) or beacon(STEM-B). Since nodes do not synchronize on theirwake-up time, the tone or beacon must be long enoughfor the intended receiver to receive it.The second category is MAC protocols with low-duty

cycles. This is a broad research area, and can be furtherdivided as TDMA protocols and contention protocols.TDMA-based MACs naturally enable low-duty-cycle

operations on nodes, since they only need to turn ontheir radio during their own time slots for sending andreceiving. Typical examples include Bluetooth [15] andLEACH [19]. In both protocols, nodes form clusters,and TDMA is used for intra-cluster communications.The major disadvantage of TDMA protocols is scal-ability, i.e., it is difficult to dynamically change theframe size or the number of slots when the number ofnodes changes. For example, Bluetooth may have atmost 8 active nodes in a cluster.Sohrabi and Pottie [38] proposed a self-organization

protocol for wireless sensor networks. Each nodemaintains a TDMA-like frame, in which the nodeschedules different time slots to communicate with itsknown neighbors. Nodes go to sleep during the timethat is not scheduled.Among contention-based MACs, the IEEE 802.11

distributed coordination function (DCF) [25] is widelyused in ad hoc networks. It has a power save (PS) mode,which enables low-duty-cycle operations. In PS mode,nodes periodically sleep and wake up, and sychronize ontheir wake-up time. 802.11 assumes all nodes are withinone hop. In multi-hop operation, the PS mode may haveproblems in clock synchronization, neighbor discoveryand network partitioning, as pointed out in [42]. Tsenget al. [42] proposed three sleep schemes to improve the802.11 PS mode. None of them requires node synchro-nizations. The cost is more frequent beaconing and morewake-up packets before broadcasts.S-MAC [44,45] is a contention MAC with integrated

low-duty-cycle operation that supports multi-hop opera-tion. More details of S-MAC are described in Section 2.2.The last category of adaptive duty cycling is achieved

through topology control. Protocols in this categoryexplore benefits that a dense network provides forenergy savings. The basic idea is to only power on asmall number of nodes that are sufficient to maintain

network connectivity. Examples include GAF [43],SPAN [8] and ASCENT [7].GAF utilizes geographic location information, and

divides the network into fixed square grids. Within eachgrid, nodes are equivalent from the routing point ofview, so only one node needs to be active at any giventime. In SPAN, each node decides whether to sleep orjoin the backbone based on connectivity informationsupplied by a routing protocol. In ASCENT, each nodemakes the decision only based on locally measuredpacket loss and connectivity information. More detailsof ASCENT is provided in Section 2.3

2.2. S-MAC

S-MAC [44,45] explores design trade-offs for energyconservation in the MAC layer. It reduces energyconsumption on radio from the following sources:collision, control overhead, overhearing unnecessarytraffic, and idle listening. Idle listening is the idle statethat the radio keeps listening for possible traffic.The basic scheme of S-MAC is to put all nodes into

low-duty-cycle mode—periodic listen and sleep. Whennodes are listening they follow a contention rule toaccess the medium, which is similar to the IEEE 802.11DCF [25]. The following discussion focuses on theadaptive duty cycling in S-MAC.

2.2.1. Coordinated sleeping

In S-MAC, nodes exchange and coordinate on theirsleep schedules rather than randomly sleep on their own.Before each node starts the periodic sleep, it needs tochoose a schedule and broadcast it to its neighbors. Toprevent long-term clock drift, each node periodicallybroadcasts its schedule as the SYNC packet. To reducecontrol overhead and simplify broadcasting S-MACencourages neighboring nodes to choose the sameschedule, but it is not a requirement. A node first listensfor a fixed amount of time, which is at least the periodfor sending a SYNC packet. If it receives a SYNCpacket from any neighbor it will follow that schedule bysetting its own schedule to be the same. Otherwise, thenode will choose an independent schedule after theinitial listen period.It is possible that two neighboring nodes have two

different schedules. If they are aware of each other’sschedules, they have two options: (1) following twoschedules by listening at both scheduled listen time;(2) only following its own schedule, but sending twice toboth schedules when broadcasting a packet. In somecases the two nodes may not be aware of the existence ofeach other, if their listen intervals do not overlap at all.To solve the problem, S-MAC let each node periodicallyperform neighbor discovery, i.e., listening for the entireSYNC period, to find unknown neighbors on a differentschedule.

ARTICLE IN PRESS

Listen Listen

for RTS for CTSfor SYNC

Sleep

time

Sleep

Fig. 1. Low-duty-cycle operation of each node in S-MAC.

D. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814 803

Fig. 1 depicts the low-duty-cycle operation of eachnode. The listen interval is divided into two parts forboth SYNC and data packets. There is a contentionwindow for randomized carrier sense time beforesending each SYNC or data (RTS or broadcast) packet.For example, it node A wants to send a unicast packetto node B, it first perform carrier sense during B’s listentime for data. If carrier sense indicates an idle channel,node A will send RTS to node B, and B will reply with aCTS if it is ready to receive data. After that, they will usethe normal sleep time to transmit and receive actual datapackets. Broadcast does not use RTS/CTS due to thepotential collisions on multiple CTS replies.Low-duty-cycle operation reduces energy consumption

at the cost of increased latency, since a node can only startsending when the intended receiver is listening. S-MACdeveloped an adaptive listen scheme to reduce the latencyin a multi-hop transmission. The basic idea is to let thenode who overhears its neighbor’s transmissions (ideallyonly RTS or CTS) wake up for a short period of time atthe end of the transmission. In this way, if the node is thenext-hop node, its neighbor is able to immediately passthe data to it instead of waiting for its scheduled listentime. If the node does not receive anything during theadaptive listening, it will go back to sleep.

2.2.2. Trade-offs on energy, latency and throughput

With low-duty-cycle operation S-MAC effectivelyreduces the energy waste due to idle listening. Experi-mental results show that an 802.11-like protocol withoutsleeping consumes 2–6 times more energy than S-MACfor traffic load with messages sent every 1–10 s in a 10-hop network. On the other hand, S-MAC with adaptivelisten has about twice the latency as the MAC withoutsleeping. Periodic sleeping increases latency and reducesthroughput. However, adaptive listening largely reducessuch cost. It enables each node to adaptively switchmode according to the traffic in the network. The overallgain on energy savings is much larger than theperformance loss on latency and throughput [45].

2.3. ASCENT: adaptive self-configuring sEnsor networks

topologies

To motivate the need for ASCENT, consider ahabitat monitoring sensor network that is to bedeployed in a remote forest. Deployment of this network

can be done, for example, by dropping a large numberof sensor nodes from a plane, or placing them by hand.Besides the fundamental theme of energy conservation,such deployments introduce scaling challenges:

* Ad hoc deployment: The sensor field cannot beexpected to be deployed in a regular fashion (e.g. alinear array, two-dimensional lattice). More impor-tantly, uniform deployment need not correspond touniform connectivity owing to unpredictable propa-gation effects when nodes, and therefore antennae,are close to the ground and other surfaces.

* Unattended operation under dynamics: The anticipatednumber of elements in these systems will precludemanual configuration, and the environmental dy-namics will preclude design-time pre-configuration.

In many such contexts it will be far easier to deploylarger numbers of nodes initially than to deployadditional nodes or additional energy reserves at a laterdate (similar to the economics of stringing cable forwired networks). ASCENT describes one approachtowards exploiting the resulting redundancy in orderto extend system lifetime. If too few of the deployednodes are used, the distance between neighboring nodeswill be too great and the packet loss rate will increase; orthe energy required to transmit the data over the longerdistances will be prohibitive. If all deployed nodes areused simultaneously, the system will be expendingunnecessary energy, at best, and at worst the nodesmay interfere with one another by congesting thechannel. ASCENT describes a localized, energy efficient,and adaptive procedure for addressing this problem.The two primary contributions of the design of

ASCENT are:

* The use of adaptive techniques that permit applica-tions to configure the underlying topology based ontheir needs while trying to save energy to extendnetwork lifetime. Results presented in [7] show thatASCENT always improves network lifetime (definedas time till 90% of network runs out of power), andfor large densities is a factor of 3 better than anetwork not running the protocol.

* The use of self-configuring techniques that react tooperating conditions measured locally. ASCENT isdesigned to react when links experience high packetloss. Results show that a stable packet delivery ratioðE100%Þ is maintained, that is between 10% and20% better than a setup without ASCENT. It doesnot detect or repair network partitions and comple-mentary techniques may be required to address suchproblems.

2.3.1. ASCENT design

ASCENT adaptively elects ‘‘active’’ nodes from allnodes in the network. Active nodes stay awake all the

ARTICLE IN PRESSD. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814804

time and perform multi-hop packet routing, while therest of the nodes remain ‘‘passive’’ and periodicallycheck if they should become active. Consider a simplesensor network for data gathering. Such a sensor fieldcannot be expected to have uniform connectivity due tounpredictable propagation effects in the environment.Therefore, regions with low and high density can beexpected. ASCENT does not deal with complete net-work partitions; it assumes that there is a high enoughnode density to connect the entire region. Fig. 2 shows asimplified schematic for ASCENT during initializationin a high-density region.Initially, only some nodes are active. The other nodes

remain passively listening to packets but not transmit-ting as shown in Fig. 2(a). The source starts transmittingdata packets toward the sink. Because the sink is at thelimit of radio range, it gets very high message loss fromthe source. This situation is called a communicationhole; the receiver gets high packet loss due to poorconnectivity with the sender. The sink then startssending help messages to signal neighbors that are inlisten-only mode—also called passive neighbors—to jointhe network. When a neighbor receives a help message,it may decide to join the network. This situation isillustrated in Fig. 2(b). When a node joins the network itstarts transmitting and receiving packets, i.e., it becomesan active neighbor. As soon as a node decides to join thenetwork, it signals the existence of a new active neighborto other passive neighbors by sending a neighborannouncement message. This situation continues untilthe number of active nodes stabilizes on a certain valueand the cycle stops. When the process completes, thegroup of newly active neighbors that have joined thenetwork make the delivery of data from source to sinkmore reliable. The process re-starts when some futurenetwork event (e.g. node failure) or environmental effect(e.g. new obstacle) causes message loss again.

3. Data management

This section describes schemes that address thesechallenges to manage data. We will focus on architec-tural frameworks for data management, rather than

Fig. 2. Self-configation of a senso

specific aggregation schemes that fit into one or more ofthese frameworks.

3.1. Overview

Data management approaches to sensor networkshave broadly followed similar themes, although theirspecific approaches have varied widely. Fig. 3 showsthree popular techniques, Directed Diffusion, Cougarand TinyDB. These schemes are based on two primaryarchitectural principles:

* Use a declarative language to specify queries on data:A declarative language can be especially useful todescribe sensor network interaction since it hidesdetails of sensor node interaction, routing andplacement of in-network processing from users.

* Support in-network processing to reduce data within

the network: Since local computation is much cheaperthan radio communication [32], shifting computationinto the network can provide significant energysavings.

While all three schemes are based on the aboveprinciples, there are significant differences in theirapproach. Directed Diffusion is a library-based lower-level approach, and provides Filters [37] to easeapplication development. The Filter API exposes theunderlying routing structure to an application designerand can be used to write algorithms that optimizecommunication and in-network processing to applica-tion requirements. We describe this approach in greaterdetail in Section 3.2.Both Cougar and TinyDB use a database approach

towards sensor data management.The Cougar device database system proposes dis-

tributing database queries across a sensor network asopposed to moving all data to a central site [5]. Sensordata is represented as an Abstract Data Type attribute,the public interface to which corresponds to specificsignal processing functions supported by a sensor type.Joins or aggregation in the network are performed asspecified by a centrally computed query plan. As shownin Fig. 3, queries are declarative, and users can issuequeries without knowing how the data is generated in

r network using ASCENT.

ARTICLE IN PRESS

Scheme Type Platform Query SpecificationLanguage

Placement of In-network Processing Paradigm Supported RoutingStrategies

DirectedDiffu-sion

Non-database

Ipaq-class(TinyDif-fusion formote-classnodes)

Application-specificlanguage can be definedusing Filters [28]

Distributed - Routes are independently constructedfrom each source of data to data sink. Aggregation isperformed at junctions where data combines. CalledOpportunistic Aggregation

Publish-subscribe - cansupport multiple datasources and sinks

Two-phase commitwhen location ofnodes is not knownand GEAR whenlocation is known

Cougar Database Ipaq-class SQL-like Centralized - A gateway node (query optimizer) main-tains a catalog of status of sensor nodes. For each newquery, optimizer selects a plan that describes both data-flow in the network and computation flow in the net-work.

Cluster-based Not explicitly speci-fied, any ad-hoc rout-ing protocol

TinyDB Database Mote-class SQL-like Opportunistic Aggregation - Routes are based on re-verse shortest path from the base-station. Conceptuallysimilar to route construction in Directed Diffusion

reverse multicast tree- all leaves are datasources

parent-child routingon tree

Fig. 3. Comparison of Data Management strategies for sensor networks.

D. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814 805

the sensor network and how the data is processed tocompute the query answer. An underlying queryoptimizer decides on the placement of in-networkprocessing, and is performed by a gateway node thathas an idea of the status of nodes and links inthe network. Such a centralized scheme can be inefficientif the status of nodes is rapidly changing (e.g. environ-mental dynamics), but can enable a potentially moreefficient global optimization of in-network query pro-cessing.TinyDB is a query processing system with small

footprint intended for highly resource-constrained motesensor nodes [20]. TinyDB provides a declarative inter-face for data collection and aggregation inspired byselection and aggregation facilities in database querylanguages. TinyDB’s aggregation service, Tag (TinyAggregation [26]), operates on a routing tree structure,where the root is typically a base-station or other egresspoint to users, and leaves of the tree constitute all nodesin the network.An instance of an aggregation query specified using

Tag is shown below:

SELECT AVG(volume), room FROM sensors

WHERE floor = 6

GROUP BY room

HAVING AVG(volume) 4 threshold

EPOCH DURATION 20s

In this example, SELECT specifies arithmetic opera-tions over aggregation attributes. The WHERE clausespecifies which sensor readings should be transmittedfrom each node, and which should be discarded. TheGROUP BY operator can be used to partition data intogroups such that the aggregation operators can beapplied separately to each group. In the above example,sensor readings are grouped by the room, and theaverage over each room computed. Periodicity of thisquery is specified by the EPOCH clause.Two other internet-centric approaches bear mention

due to their relation to naming sensor data.

The Intentional Naming System is an attribute-basedname system operating in an overlay network over theInternet [2]. Its use of attributes as a structuringmechanism and a method to cope with dynamicallylocating devices is similar in spirit to Directed Diffusion.DataSpace describes an attribute-based naming me-

chanism for querying physical objects that produce andstore local data [21]. The DataSpace is divided intosmaller administrative and logical datacubes, which arelogically grouped into dataflocks. Query results mayinvolve aggregation of more specific queries addressedto sub-datacubes. This approach is internet-centric anddoes not explore in-network processing.

3.2. Directed diffusion

Directed Diffusion has three key characteristics:localized algorithms, named data, and support for in-network processing. Diffusion adopts a declarative,publish/subscribe API that isolates data producers andconsumers from the details of the underlying datadissemination algorithms [18]. The key abstraction ofthis API is that data is identified by a set of attributes,data producers (or sources) generate data it by publish-ing, data consumers (or sinks) subscribe to data, and it isthe business of the diffusion implementation to ensurethat data travels from publisher to subscriber efficiently.Diffusion encourages applications to influence data flowthrough the use of filters [37] and in-network processing,but many applications require only attribute-selecteddata and allow diffusion to completely control routing.Many different algorithms can match publishers andsubscribers without change to the high-level API orsemantics.

Routing in directed diffusion: Diffusion uses a two-phase algorithm [18] where data consumers seek outdata sources, and then sources search to find the bestpossible path back to subscribers. A subscriber, or datasink, identifies data by a set of attributes. Thisinformation propagates through the network in aninterest message. In principle, information cached from

ARTICLE IN PRESSD. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814806

prior runs, other constraints (such as geographicinformation), or application-specific filters can be usedto optimize the distribution of interests. Without suchinformation, however, interests must be floodedthroughout the network to find any data sources (asshown in Fig. 4(a)). As they are distributed, nodesestablish gradients, state indicating the next-hop direc-tion of other nodes interested in the data (Fig. 4(b)).When an interest arrives at a data producer, that sourcebegins producing data. (To conserve power, nodes mayavoid producing data before being triggered, or theymay produce and store such data locally.) The first datamessage sent from the source is marked as exploratoryand is sent to all neighbors that have matchinggradients. As with interest messages, this transfer couldbe limited using additional information or applicationinvolvement, but by default it is sent to all nodes. Whenexploratory data reaches the sink, the sink reinforces itspreferred neighbor (Fig. 4(c)), establishing a reinforcedgradient towards the sink. (Preference being given to thelowest latency neighbor, possibly modified by otherconcerns such as link quality or energy.) The reinforcedneighbor reinforces its neighbor in turn, all the way backto the data source or sources, resulting in a chain ofreinforced gradients from all sources to all sinks.Subsequent data messages are not marked exploratory,and are sent only on reinforced gradients rather than toall neighbors. Nodes can also generate negative reinfor-cements if they receive data that is not relevant to them.Typically, this occurs when topology changes andmultiple gradients accidentally point to the same node.A negative reinforcement corrects this situation.Gradients are managed as soft-state, thus both

interests and exploratory data occur periodically torefresh this state. Interests are sent every interestinterval, exploratory data every exploratory interval.Variants of this basic model have been proposed to

optimize Diffusion for different application require-ments. For instance, a two-phase diffusion model maynot be efficient when applications have many sourcesand sinks cross-subscribed to each other, a case thatresults in a large amount of control traffic, even withgeographic scoping. Push Diffusion, introduced in [17],reverses the role of data publishers and subscribers,causing data sources to actively search for consumers.

Sink

Event

Source

Interests

Event

Source

(a) (b)

Fig. 4. A simplified schematic for directed diffusion: (a) interest propagatio

An advantage of push is that it requires only one phasewhere control traffic needs to be widely disseminated tofind sinks, unlike the two phases needed in two-phasepull. One-phase pull [17] is a third diffusion algorithmthat simplifies two-phase pull by eliminating one side ofthe search.

3.2.1. Aggregation using filters

An example of filter-driven data aggregation usingDirected Diffusion can be used to illustrate how in-network processing can reduce data traffic to conserveenergy.An anticipated sensor application is to query a field of

sensors and then take some action when one or more ofthe sensors is activated. For example, a surveillancesystem could notify a biologist if an animal enters aregion. Coverage of deployed sensors will overlap toensure robust coverage, so one event will likely triggermultiple sensors. All sensors will report detection tothe user, but communication and energy costs can bereduced if this data is aggregated as it returns to theuser. Data can be aggregated to a binary value (therewas a detection), an area (there was a detection inquadrant 2), or with some application-specific aggrega-tion (seismic and infrared sensors indicate 80% chanceof detection).Although details of aggregation can be application-

specific, the common systems problem is the design ofmechanisms for establishing data dissemination paths tothe sensors within the region, and for aggregatingresponses. Consider how this kind of data fusion maybe implemented in a traditional network with topologi-cally assigned low-level node names. First, in order todetermine which sensors are present in a given region, abinding service must exist which, given a geographicalregion, lists the node identifiers of sensors within thatregion. Once these sensors are tasked, an electionalgorithm must dynamically elect one or more networknodes to aggregate the data and return the result tothe querier.Instead, Directed Diffusion [18] addresses this pro-

blem using opportunistic data aggregation. Sensorselection and tasking is achieved by naming nodes usinggeographic attributes. As data is sent from the sensors tothe querier, intermediate sensors in the return path

Sink

Gradients

Sink

Event

Source

(c)

n, (b) initial gradients set up, (c) data delivery along reinforced path.

ARTICLE IN PRESSD. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814 807

identify and cache relevant data. This is achieved byrunning application-specific filters. These intermediatenodes can then suppress duplicate data by simply notpropagating it, or they may slightly delay and aggregatedata from multiple sources.Filters benefit opportunistic aggregation strategies in

many ways. It provides a natural approach to injectapplication-specific code into the network. Attributenaming and matching allow these filters to remaininactive until triggered by relevant data. A commonattribute set means that filters incur no network costs tointeract with directory or mapping services.More complex examples of in-network aggregation

using filters are also discussed in [18]. Nested queries,where one sensor cues another, can be used fortriggering, and used to reduce overall energy consump-tion significantly. For example, a person entering aroom is often correlated with changes in light or motion,Multi-modal sensor networks can use these correlationsby triggering a secondary sensor based on the status ofanother, in effect nesting one query inside another.Reducing the duty cycle of some sensors can reduceoverall energy consumption (if the secondary sensorconsumes more energy than the initial sensor, forexample as an accelerometer triggering a GPS receiver)and network traffic (for example, a triggered imagergenerates much less traffic than a constant videostream). Alternatively, in-network processing mightchoose the best application of a sparse resource (forexample, a motion sensor triggering a steerable camera).

1A communication hole is when greedy forwarding reaches a local

maximum where the current node is closer to the destination than any

of its neighbors.

4. Geographic routing

The physical nature of a sensor network’s deploymentmakes geographically scoped queries natural. If nodesknow their locations, then geographic queries can beleveraged to constrict the data dissemination to therelevant region and to reduce routing control overhead.Geographic Routing protocols such as GPSR [23] orGEAR [46] can be used to optimize the process offinding sources by using geographic information toconstrain the search process. Since geographical routingdoes not have to be tied to any particular datadissemination scheme, we describe these protocolsseparate from our discussion of data managementschemes. However, these routing protocols can beintegrated quite easily into the attribute-naming frame-works where location is considered an attribute.

4.1. Overview

Most geographical routing protocols use greedyforwarding to route packets to the destination. Theydiffer from each other in how they handle communica-tion holes. Among early work in geographical routing,

Finn [12] used a restricted flooding search to navigatearound holes.1 One drawback of this mechanism is thedifficulty in determining an appropriate scope for thesearch.Greedy Perimeter Stateless Routing (GPSR [23]),

elegantly avoids this problem by deriving a planar graphout of the original network graph. When a packetreaches a region where greedy forwarding is impossible,GPSR recovers by forwarding the packet along theperimeter of the planar graph to circumvent commu-nication holes. We discuss this scheme in more detail inSection 4.2.While GPSR addresses point-to-point routing,

GEAR (Geographic and Energy-Aware Routing) [46]studies the problem of forwarding a packet to all thenodes inside a target region, which is a commonprimitive in data-centric sensor network applications[22]. The protocol has two distinctive features:

* It uses an energy-aware neighbor selection heuristicto forward packets. Nodes with more remainingenergy are preferred over depleted nodes to avoidnodes that

* It proposes a Recursive Geographic Forwardingalgorithm to dissemminte packets within a targetregion. Systematic unicast is used to deliver packetsto the region as opposed to broadcast, thus limitingthe receive power expended by nodes.

GEAR (Geographic and Energy-Aware Routing) hasbeen used to extend Directed Diffusion when nodelocations and geographic queries are present [46].

4.2. Greedy perimeter stateless routing (GPSR)

GPSR offers two benefits. First, it is stateless, andrequires propagation of topology information onlywithin a single hop neighborhood. Such a localizedalgorithm scales better in dynamic networks, where aglobal algorithm would need to track each topologychange or risk having inaccurate information. Second,when a packet reaches communication holes, GPSRuses a distributed perimeter forwarding algorithm thatroutes around such obstacles.There are two main components to GPSR:Greedy forwarding: A forwarding node makes a

locally optimal, greedy choice for the next hop bychoosing the neighbor that is geographically closestto the packets destination (as shown in Fig. 5). Suchan approach has negligible state overhead that dependson the density of deployment, and not on the totalnetwork size.

ARTICLE IN PRESS

Routing Void

Right Hand Ruleforwarding

Source

Sink

a

b

c d

Fig. 5. Greedy and Perimeter Forwarding in GPSR: Greedy mode is

used till packet reaches a routing void. Perimeter forwarding uses

graph planarization techniques to route around voids.The well-known

right-hand rule traverses the interior of a closed polygonal region in

clock-wise edge order, in this case, a �4b �4c �4d:

D. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814808

However, greedy forwarding will not work intopologies such as Fig. 5, where to reach the destination,the packet would need to be routed farther in geometricdistance from the destination.

Perimeter forwarding: GPSR explores two distributedalgorithms to construct planar graphs, the RelativeNeighborhood Graph(RNG) and Gabriel Graph (GG),to enable perimeter forwarding. These proceduresinvolve selecting a subset of edges in the network suchthat there are no crossing edges in the resulting graph.Given such a sub-graph, perimeter forwarding can bedone in a straightforward manner using the Right Handtraversal rule. A detailed treatment of the planarizationprocedure can be obtained from [23].The complete GPSR algorithm combines greedy

forwarding on the full network topology, with theperimeter forwarding on the subgraph when greedyforwarding becomes impossible.By keeping state only about local topology and using

it in the packet forwarding, GPSR scales better thanother ad hoc routing protocols, However, the derivedplanar graph in GPSR is much sparser than the originalone, and the traffic concentrates on the perimeter of theplanar graph in perimeter mode. Thus, the nodes on theplanar graph tend to be depleted quickly.

5. Sensor network monitoring and maintenance

As in most distributed systems, it is important to havean infrastructure for wireless sensor networks to provideindication of node failures, resource depletion, andoverall system performance. More than for wired

networks and ad hoc wireless networks, sensor networksneed a comprehensive system for understanding theexecution of the network as a whole. There are severalreasons for this: sensor networks implement intricatedistributed and collaborative applications; sensor nodesuse wireless communication the vagaries of which arewell documented; nodes detect physical events in theenvironment, and the variability in the physical phe-nomena they sense at least rivals that of wirelesscommunication. However, existing distributed monitor-ing and management protocols are designed for multi-computer distributed systems connected with wirednetworks. For example, SNMP [6] provides per-deviceinformation databases for network monitoring andmanagement, where node state and logs can be collectedand processed. However, the energy and bandwidthconstraints of wireless sensor networks preclude suchcentralized solution. In addition, collaboration betweenredundant sensor nodes imposes challenges in thedefinition of system failure and status. One or few nodecrashes do not necessarily undermine system function-ality. A regional or global view of the system is helpfulto identify possibility of failures.

5.1. Overview

Several protocols are proposed recently to addressdifferent important aspects of system monitoring andmaintenance for sensor networks. To meet the uniquedesign challenges in wireless sensor networks, explicitlyor implicitly, those proposals share two common designprinciples: localized network state exchange and in-network aggregation.For example, the coverage problem in wireless sensor

networks is studied in [28,29]. By exchanging sensorcoverage information within a local neighborhood, thesetechniques detect maximal breach path and maximalsupport path, along which there is poorest and bestcoverage of sensors, respectively. In [9,39], node failureinformation is collected using similar techniques. Bylimiting state exchange of nodes to only their localvicinity, these protocols successfully monitor collectivenetwork states without communicating individual nodestate over long distance.

eScan [47] takes advantage of in-network aggregationto construct an aggregated map of the remaining energylevels for different regions in a sensor field. Instead ofextracting individual node states, the protocol combinesresidual energy level information into a more compactform if those nodes are nearby and have similar energylevels. To accomplish a similar task, [30] applies anetwork state model to represent node behavior. Byprobabilistically predicting the energy consumption, itreduces the communication overhead of reportingenergy levels to the monitoring node. Dimensions [13]proposes the use of multi-resolution wavelet processing

ARTICLE IN PRESS

Scan AV=(35%,37%)

Scan BV=(35%,36%) Scan C

V=(35%,37%)

Fig. 7. Representation and aggregation of eScans.

D. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814 809

to compress debugging data such as network-widepacket-loss statistics. This scheme has two properties:(a) correlations in data over time and spatial dimensionscan be exploited, and (b) the scheme provides approx-imate, but sufficiently accurate responses to manyqueries with low communication overhead.Some protocols are designed to self-regulate monitor-

ing activity when network workload is heavy or at leastprovide a ‘‘knob’’ for adaptive aggregation. In eScan

[47], resolution (how close nodes are geographicallynearby) and tolerance (how similar their energy levelsare) can control the extent of aggregation. In STREAM[10], topology mapping of sensor networks can becollected at different levels of detail, which is decidedby the parameters of ‘‘virtual range’’ and ‘‘resolutionfactor’’.Note that none of techniques above are specific to

sensor network monitoring and management. Theyapply in general to most scenarios of sensor networkapplication design and implementation. However, mon-itoring protocols have unique requirements: given thatmonitoring tends to be a long-term activity on largenetworks, the energy constraints are more strict on thoseprotocols. Furthermore, those protocols are supposed tobe even more robust than generic applications becausethey might be the last resort when a massive failurehappens.

5.2. A monitoring architecture for wireless sensor

networks

In this section, eScan [47] and digest [48] areillustrated as two examples of sensor network monitor-

Fig. 6. Monitoring wirele

ing solutions. These tools can be combined into acoherent architecture for monitoring sensor networks[48]. This architecture is distinguished by three levels ofmonitoring, where each level consists of a class of tools.Each level is distinguished from the next in the spatial ortemporal scale at which the corresponding tools areinvoked. This is illustrated in Figs. 6 and 7.The first component consists of tools such as dump,

which can seen as more classical SNMP like componentin this architecture. Upon user request, dump collectsdetailed node state or logs over the network fordiagnosis. For example, the raw temperature readingsfrom some sensors could be dumped to debug thecollaborative event detection algorithm between nearbynodes. Dump can be implemented as an generic

ss sensor networks.

ARTICLE IN PRESSD. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814810

application over Directed Diffusion [22] or other datamanagement frameworks . Because the amount of dataper node may be large, dump should be invoked only atsmall spatial scales (i.e., from a few nodes), and onlywhen there is a reasonable certainty of a problem atthose nodes.A second class of tools, scans, is envisoned to guide

system administrators to the location of problems. Scansrepresent abstracted views of resource consumptionthroughout the entire network, or throughout asignificant section of the network. Thus, this class oftools has a significantly greater spatial extent thandumps. One example of a scan is the eScan [47]. Tocompute an eScan, a special user-gateway node initiatescollection of node state, for instance residual energysupply level, from every node in the system. Instead ofdelivering the raw data to user node, eScan computationtakes advantage of in-network aggregation. Residualenergy level data from individual nodes are combinedinto more compact forms, if and only if those nodes arenearby and have similar energy level. By pushing thedata processing into the network, eScan constructs anapproximate system-wide view of energy supply levelswith much less communication cost as compared tocentralized collection. From such a global view, usersare able to isolate those nodes upon which they caninvoke tools such as dump. Simulation studies in [47]reveal that good aggregation benefit (E15% datareduction) is obtained with low distortion ðE5%Þ inthe energy scans collected from a 400 node randomlyplaced network.Clearly, the energy cost of collecting an eScan can be

significant, and a third class of tools, digests [48], canhelp alert users to error conditions (partitions, nodedeaths) within the network. A digest is an aggregate ofsome network property in small size (say a few bytes).For example, the size of network i.e. the number ofnodes, can indicate several system health conditions:Sudden drop in the network size can be taken as hint formassive node failure or network partitioning. Digests,like eScans, also span the entire network, or a largespatial extent. However, unlike eScans, they arecontinuously computed. Digests are not intended toisolate network problems, merely to tell users when toinvoke network-wide scans.

6. Case study: building a distributed storage framework

We consider the application of the networkingschemes described above in the context of a distributeddata storage and querying framework, Dimensions[13,14]. This system is targeted at long-term scientificdeployments, such as micro-climate monitoring [16,27],to obtain data about previously unobservable phenom-ena for detailed analysis by experts in the field.

Consider the following sample deployment: A medium

scale wireless sensor network (many hundreds of nodes) is

deployed in parts of a reserve park such as the James

Reserve, for monitoring climatic variations at finer

granularities, and larger scales than previously possible.

These nodes are equipped with weather sensors: tempera-

ture, pressure, humidity and rainfall. These sensor

networks can be queried through a gateway (or base-

station) connected to the internet. User queries are

typically spatio-temporal, including simple min-max

queries, more complex queries searching for edge

patterns, long-term data trends, anomalous behavior,frequency of events, etc. For the small subset of nodes

that satisfy these queries, the user may decide to obtain

more detailed datasets for further off-line analysis. This

could be requested in a multi-resolution manner, first

obtaining a low-resolution dataset which can be used to

‘‘decide’’ that a higher resolution, possibly lossless,dataset needs to be extracted.Data analysis in such applications often involves

complex signal manipulation, including modeling,searching for new patterns or trends, looking forcorrelation structures, etc. Conventional approaches tosuch monitoring has involved wired and sparselydeployed networks that transfer all data from sensorsto a central data repository for persistent storage. Theconsequence of limited energy and storage resources ofsensor nodes introduce two problems: (a) it severelylimits deployment lifetime if all raw data must betransmitted to a central location, [1], and (b) the storageresources on individual sensor nodes are often notsufficient to store data over long durations losslessly.We provide a high-level description of the methodologyused in Dimensions to tackle these problems, and thendiscuss different ways in which networking techniques fitinto such a system.

6.1. Dimensions architectural overview

Dimensions [14] constructs progressively lossy hier-archy of wavelet summaries of sensor data anddistributes them around the network. These summariesare generated in a multi-resolution manner, i.e. there areequal-sized summaries corresponding to different spatialand temporal scales. Fig. 8 illustrates their construction:at each higher level of the hierarchy, summariesencompass larger spatial scales, but are compressedmore, and therefore more lossy. At the highest level, oneor a few nodes have a very lossy summary for all data inthe network. Such a multi-resolution approach results ina per-node worst case communication cost of Oðlog nÞas opposed to Oð

ffiffiffi

np

Þ for centralized data collection,where n is the number of nodes in the network.Queries into the network are routed in the reverse

order of building the summaries. A query is injected intothe network at the highest level of the hierarchy, and is

ARTICLE IN PRESS

Fig. 8. Constructing a dimensions hierarchy: temporal and spatial summarization.

D. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814 811

processed on a coarse highly compressed summarycorresponding to a large spatio-temporal volume. Theresult of this processing is an approximate result thatindicates which regions in the network are most likely toprovide a more accurate response to the query. Thequery is forwarded to summaries for these regions of thenetwork, and processed on more detailed views of thesesub-regions. This procedure continues until the query isrouted to a few nodes at the lowest level of thehierarchy. In this way, query processing cost can bedramatically reduced when the target data is sparselydistributed. The authors show that for the studied set ofquery types (Max, Min, Avg, Edge), Dimensions cansatisfy queries to within E10% error by querying lessthan 5% of a 160 node network.Long-term storage in Dimensions exploits the pro-

gressively lossy storage model. Summaries are aged suchthat those corresponding to larger spatial areas andlonger time-scales are retained for longer periods oftime. Thus, even though raw sensor data and highresolution summaries may be aged early, coarsesummaries are retained about past data and can beused for long-term querying, at the cost of slightlyreduced accuracy.

6.2. Networking schemes in dimensions

A system such as Dimensions has numerous network-ing components. Significant amount of data is trans-ferred around the network in the form of summaries atdifferent spatio-temporal resolutions. Drill-downqueries need to be expressed in a query language, andinvolve in-network computation at different levels of the

hierarchy. In this section, we discuss how thesecomponents can be constructed using schemes that wehave discussed.

6.2.1. Adaptive duty cycling

As described in Section 2, multihop routing requiresthat radios be turned on, whereas nodes would need toswitch them off for power conservation. How would anadaptive duty cycling strategy fit with such a scheme,which requires significant data transfer? To understandthe coupling between the two schemes, we look at therouting requirements in more detail. Routing is requiredfor two purposes in Dimensions: (a) to transfer databetween clusterheads at different levels, and (b) to routequeries in a drill-down manner. In the first case, latencyis not a concern, since this is a background, slow-running process. Thus, a scheme like STEM [35] wouldnot be necessary. Further, the periodicity of commu-nication is fixed, since summaries are generated everyepoch. These requirements lend the problem to beparticularly well-suited for a scheduling-based protocolsuch as S-MAC [44]. The advantages of using SMAC forlong messages can be exploited to send the summariesefficiently.Routing queries does not require large data transfers,

but have two features: (a) they can be latency-limited, inthe case of an interactive user session, and (b) queriesmay arrive at random times, and the user couldpotentially query the network from any location withinit. For the first case, an on-demand wakeup scheme suchas STEM can be used if multiple radios are available ateach node. Such a protocol would ensure that queriesare answered with low latency, albeit at higher cost. For

ARTICLE IN PRESS

Clusterhead

Multihop routing using GPSR

Routing in Dimensions

Fig. 9. Routing between clusterheads at different levels of the

hierarchy involves using the GPSR geographic routing protocol.

D. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814812

the second case, a network backbone would need to becontinually maintained, since user queries can beinjected into the network anytime and from anywhere.A protocol like ASCENT [7], that maintains a networkrouting topology continually, would enable such query-ing, while being robust to environmental dynamics.

6.2.2. Debugging the network deployment

The following example can illustrate why debugging adeployed network such as the one for environmentalmonitoring can be difficult.

An unexpected situation has arisen and performance of

the deployed sensor network has degraded significantly:queries are lost, throughput is low, individual nodes cannot

be contacted, reliable transmission is slow and involves

many retransmissions. A maintenance engineer is called to

the site to look into the situation. There are a number of

possible causes: node failures causing a ill connected

network, interference from new sources, general battery

degradation over time requiring adjustment of operating

parameters, etc. A significant amount of debugging

information is required to clearly understand the problem.

Debugging each individual node by physically accessing it

is impractical given the scale, and infeasible due to the

lack of sufficient personnel.How can the engineer design a networked debugging

scheme to tackle such problems? A suite of debuggingtools such as eScan can be appropriately used to solvethe problems. For the above instance, periodic digests

[48] can provide clues to the engineer to help debug theproblem.

6.2.3. Geographic data forwarding

The first step in constructing the distributed storagestructure is the selection of cluster nodes that areresponsible for data storage at different resolutions.Dimensions builds on prior work in Data CentricStorage (DCS [33]) that constructs a Distributed HashTable (DHT [34]) to hash the name of a certain eventkey (dataName) to a location within the network. Eachnode in the network uses the same hash function, andthus global consensus on the mapping between locationsand keys is achieved.This procedure is slightly modified in Dimensions to

elect a clusterhead to represent each square grid using aglobally well-known hash function. A geographic hashfunction is used that accepts the location of the sourcedata and current time, and returns the locationcorresponding to the clusterhead is used.Two routing questions are posed by such an

approach. First, how is routing done to the chosenlocation? Second, how can one ensure that a node ispresent at the location, since the hash function operateswithout a knowledge of the global topology?GPSR, described in Section 4.2, can be used to

address the first concern, i.e. to forward data to the

destination location obtained from the hash function.To address the second problem, the perimeter forward-ing strategy in GPSR is cleverly used in DCS [33] toconsistently deliver data addressed to a location to thenode closest to the target location. Thus, the storageapplication does not need to deal with this translation(Fig. 9).

6.2.4. Drill-down queries

Sensor data for the application is identified by nodelocation, time, and sensor type. Queries on the data canbe expressed using these attributes and operators tomanipulate them.Consider a simple MAX query that operates on

temperature data from the network: Find the Max

temperature between Jan and July, 2003. The query isfirst routed to the root, which has a coarse summarycorresponding to the entire network. The root runs thequery over its local summary and identifies whichquadrant is likely to have the maximum temperaturefor the specified period. Since the root has only a coarsesummary, the result returned by this processing is likelyto be approximate. However, further drill-down is likelyto improve the accuracy of the result. The query isgeographically routed to the clusterhead correspondingto the quadrant, and the procedure repeats.How can drill-down query processing be implemented

using Directed Diffusion? Nested queries, discussed in[18], provide a natural way of expressing drill-downquerying. Nested queries can be implemented byenabling code at each triggered sensor that watches fora nested query. This code then generates a sub-querythat is routed to the relevant quadrant for furtherrefining the approximate query answer. Each furtherlevel of drilldown can be expressed as an additional levelof nesting, where the result of the processing determinesthe next level query.

ARTICLE IN PRESSD. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814 813

7. Conclusions

Wireless sensor networks enable dense sensing of theenvironment, offering unprecedented opportunities forobserving the physical world. These systems offerunique challenges to researchers: scale, unpredictablewireless communication conditions, severely energy andresource constraints. In this paper, we have summarizedresearch in networking techniques for sensor networks,focusing on representative approaches.As the sensor network community moves out of its

infancy, and embark on real deployments [16,27], manyof these networking schemes that have been developedare being put to test in real environments. Eachapplication domain will, no doubt, introduce novelchallenges, and involve application-specific optimizationto the networking schemes that we have discussed.However, the schemes that we have discussed provide abroad library of techniques from which to selectappropriate strategies.

References

[1] A.A. Abidi, G.J. Pottie, W.J. Kaiser, Power-conscious design of

wireless circuits and systems, Proc. IEEE 88 (10) (October 2002)

1528–1545.

[2] W. Adjie-Winoto, E. Schwartz, H. Balakrishnan, J. Lilley, The

design and implementation of an intentional naming system,

in: Proceedings of the 17th Symposium on Operating

Systems Principles, Kiawah Island, SC, USA, December 1999,

pp. 186–201.

[3] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, A

survey on sensor networks, IEEE commun. Mag. 40 (8) (August

2002) 102–114.

[4] F. Bennett, D. Clarke, J.B. Evans, A. Hopper, A. Jones, D. Leask,

Piconet: embedded mobile networking, IEEE Pers. Commun.

Mag. 4 (5) (October 1997) 8–15.

[5] P. Bonnet, P. Seshadri, Device database systems, in: 16th

International Conference on Data Engineering, San Diego,

California, February 2000.

[6] J. Case, M. Fedor, M.L. Schoffstall, C. Davin, Simple Network

Management Protocol, Technical Report, internet-rfc, November

1996.

[7] A. Cerpa, D. Estrin, Ascent: adaptive self-configuring sEnsor

networks topologies, in: Proceedings of the IEEE Infocom, New

York, NY, June 2002, IEEE, New York.

[8] B. Chen, K. Jamieson, H. Balakrishnan, R. Morris, Span: an

energy-efficient coordination algorithm for topology maintenance

in ad hoc wireless networks, in: Proceedings of the ACM/IEEE

International Conference on Mobile Computing and Networking

(Mobicom), Rome, Italy, July 2001, ACM, New York, 2001,

pp. 85–96.

[9] S. Chessa, P. Santi, Comparison based system-level fault

diagnosis in ad hoc networks, in: Proceedings of the 20th,

October 2001.

[10] B. Deb, S. Bhatnagar, B. Nath, Multi-resolution state retrieval in

sensor networks, in: Proceedings of the IEEE ICC Workshop on

Sensor Network Protocols and Applications, Anchorage, AK,

May 2003, to appear.

[11] D. Estrin, R. Govindan, J. Heidemann, S. Kumar, Next

century challenges: scalable coordination in sensor networks, in:

Proceedings of the Fifth Annual ACM/IEEE International

Conference on Mobile Computing and Networking, 1999,

pp. 263–270.

[12] G.G. Finn, Routing and addressing problems in large metropo-

litan-scale internetworks, Technical Report ISI/RR-87-180, USC/

ISI, March 1987.

[13] D. Ganesan, D. Estrin, J. Heidemann, Dimensions: why do we

need a new data handling architecture for sensor networks?, in:

First Workshop on Hot Topics in Networks (Hotnets-I), Vol. 1,

October 2002.

[14] D. Ganesan, B. Greenstein, D. Perelyubskiy, D. Estrin, J.

Heidemann, An evaluation of multi-resolution search and storage

in resource-constrained sensor networks, in: Proceedings of the

First ACM Conference on Embedded Networked Sensor Systems

(SenSys), 2003, to appear.

[15] J.C. Haartsen, The Bluetooth radio system. IEEE Pers. Commun.

Mag. February 2000, 28–36.

[16] M. Hamilton, James San Jacinto Mountains Reserve.

[17] J. Heidemann, F. Silva, D. Estrin, Matching data dissemination

algorithms to application requirements, Technical Report

ISI-TR-571, USC/Information Sciences Institute, April 2003.

[18] J. Heidemann, F. Silva, C. Intanagonwiwat, R. Govindan, D.

Estrin, D. Ganesan, Building efficient wireless sensor net-

works with low-level naming, in: Proceedings of the Symposium

on Operating Systems Principles, Chateau Lake Louise,

Banff, Alberta, Canada, October 2001, ACM, New York,

pp. 146–159.

[19] W.R. Heinzelman, A. Chandrakasan, H. Balakrishnan, Energy-

efficient communication protocols for wireless microsensor net-

works, in: Proceedings of the Hawaii International Conference on

Systems Sciences, January 2000.

[20] J. Hill, R. Szewczyk, A. Woo, S. Hollar, D. Culler, K. Pister,

System architecture directions for networked sensors, in: Proceed-

ings of the Ninth International Conference on Architectural

Support for Programming Languages and Operating Systems

(ASPLOS-IX), Cambridge, MA, USA, November 2000, ACM,

New York, pp. 93–104.

[21] T. Imielinski, S. Goel, DataSpace: querying and monitoring

deeply networked collections in physical space, IEEE Personal

Commun. Special Issue on Smart Spaces and Environments 7 (5)

(October 2002) 4–9.

[22] C. Intanagonwiwat, R. Govindan, D. Estrin, Directed diffusion: a

scalable and robust communication paradigm for sensor net-

works, in: Proceedings of the ACM/IEEE International Con-

ference on Mobile Computing and Networking (Mobicom),

Boston, MA, USA, ACM, New York, August 2000, pp. 56–67.

[23] B. Karp, H.T. Kung, Gpsr: greedy perimeter stateless routing for

wireless networks, in: Proceedings of ACM Mobicom, Boston,

MA, 2000.

[24] O. Kasten, Energy consumption, http://www.inf.ethz.ch/~kasten/

research/bathtub/energy consumption.html. Eldgenossische

Technische Hochschule Zurich.

[25] LAN MAN Standards Committee of the IEEE Computer

Society, Wireless LAN Medium Access Control (MAC) and

Physical Layer (PHY) Specification, IEEE Std 802.11-1999

edition, IEEE, New York, 1999.

[26] S. Madden, M. Franklin, J. Hellerstein, W. Hong, Tag: a tiny

aggregation service for ad hoc sensor networks, in: OSDI, Vol. 1,

Boston, MA, 2002.

[27] A. Mainwaring, J. Polastre, R. Szewczyk, D. Culler, J. Anderson,

Wireless sensor networks for habitat monitoring, in: ACM

International Workshop on Wireless Sensor Networks and

Applications, Atlanta, GA, September 2002.

[28] S. Meguerdichian, F. Koushanfar, M. Potkonjak, M. Srivastava,

Coverage problems in wireless ad hoc sensor networks, in:

Proceedings of the IEEE Infocom, 2001.

ARTICLE IN PRESSD. Ganesan et al. / J. Parallel Distrib. Comput. 64 (2004) 799–814814

[29] S. Meguerdichian, F. Koushanfar, G. Qu, M. Potkonjak,

Exposure in wireless ad hoc sensor networks, in: Proceedings of

the ACM/IEEE International Conference on Mobile Computing

and Networking (Mobicom), July 2001, pp. 139–150.

[30] R.A.F. Mini, B. Nath, A.A.F. Loureiro, Prediction-based

approaches to construct the energy map for wireless sensor

networks, in: 21 Simpsio Brasileiro de Redes de Computadores,

Natal, RN, Brasil, May 2003, to appear.

[31] L.L. Peterson, A yellow-pages service for a local-area network,

Proceedings of ACM SIGCOMM ’87, August 1987, pp. 235–242.

[32] G.J. Pottie, W.J. Kaiser, Wireless integrated network sensors,

Commun. ACM 43 (5) (May 2000) 51–58.

[33] S. Ratnasamy, D. Estrin, R. Govindan, B. Karp, L. Yin, S.

Shenker, F. Yu, Data-centric storage in sensornets, in: ACM First

Workshop on Hot Topics in Networks, 2001.

[34] S. Ratnasamy, B. Karp, L. Yin, F. Yu, D. Estrin, R. Govindan, S.

Shenker, Ght—a geographic hash-table for data-centric storage,

in: First ACM International Workshop on Wireless Sensor

Networks and their Applications, 2002.

[35] C. Schurgers, V. Tsiatsis, S. Ganeriwal, M. Srivastava, Optimiz-

ing sensor networks in the energy-latency-density space, IEEE

Trans. Mobile Comput. 1 (1) (2002) 70–80.

[36] E. Shih, P. Bahl, M. Sinclair, Wake on wireless: an event driven

energy saving strategy for battery operated devices, in: Proceed-

ings of the ACM/IEEE International Conference on Mobile

Computing and Networking (Mobicom), Atlanta, GA, September

2002, ACM, New York, 2002.

[37] F. Silva, J. Heidemann, R. Govindan, Network Routing

Application Programmer’s Interface (API) and Walk Through

9.0.1, USC/Information Sciences Institute, December 2002.

[38] K. Sohrabi, G.J. Pottie, Performance of a novel self-organization

protocol for wireless ad hoc sensor networks, in: Proceedings

of the IEEE 50th Vehicular Technology Conference, 1999,

pp. 1222–1226.

[39] J. Staddon, D. Balfanz, G. Durfee, Efficient tracing of failed

nodes in sensor networks, in: Proceedings of the First ACM

International Workshop on Wireless Sensor Networks and

Applications, Atlanta, USA, September 2002, pp. 122–130.

[40] M. Stemm, R.H. Katz, Measuring and reducing energy con-

sumption of network interfaces in hand-held devices, IEICE

Trans. Commun. E80-B (8) (August 1997) 1125–1131.

[41] Ten emerging technologies that will change the world, Technol.

Rev. (2003).

[42] Y.-C. Tseng, C.-S. Hsu, T.-Y. Hsieh, Power-saving protocols for

IEEE 802.11-based multi-hop ad hoc networks, in: Proceedings of

the IEEE Infocom, New York, June 2002, IEEE, New York,

pp. 200–209.

[43] Y. Xu, J. Heidemann, D. Estrin, Geography-informed energy

conservation for ad hoc routing, in: Proceedings of the ACM/

IEEE International Conference on Mobile Computing and

Networking (Mobicom), Rome, Italy, July 2001, ACM, New

York, pp. 70–84.

[44] W. Ye, J. Heidemann, D. Estrin, An energy-efficient mac protocol

for wireless sensor networks. in: Proceedings of the IEEE

Infocom, New York, NY, June 2002, IEEE, New York, 2002,

pp. 1567–1576.

[45] W. Ye, J. Heidemann, D. Estrin, Medium access control with

coordinated, adaptive sleeping for wireless sensor networks,

Technical Report ISI-TR-567, USC Information Sciences In-

stitute, January 2003, IEEE/ACM Transactions on Networking

to appear.

[46] Y. Yu, R. Govindan, D. Estrin, Geographical and energy aware

routing: a recursive data dissemination protocol for wireless

sensor networks, Technical Report, May 2001.

[47] Y. Zhao, R. Govindan, D. Estrin, Residual energy scans for

monitoring wireless sensor networks, in: Proceedings of the IEEE

Wireless Communications and Networking Conference, March

2002.

[48] J. Zhao, R. Govindan, D. Estrin, Computing aggregates for

monitoring wireless sensor networks, in: Proceedings of the IEEE

ICC Workshop on Sensor Network Protocols and Applications,

Anchorage, AK, May 2003, to appear.