1 Querying Sensor Networks Sam Madden UC Berkeley December 13 th, New England Database Seminar.

1

Querying Sensor Networks

Sam MaddenUC Berkeley

December 13th, New England Database Seminar

2

TinyDB Introduction

• What is a sensor network?• Programming sensor nets is hard!• Declarative queries are easy

– TinyDB: In-network processing via declarative queries

• Example: » Vehicle tracking application: 2

weeks for 2 students» Vehicle tracking query: took 2

minutes to write, worked just as well!

SELECT nodeid FROM sensors WHERE mag > threshEPOCH DURATION 64ms

3

Overview

• Sensor Networks• Why Queries in Sensor Nets• TinyDB

– Features– Demo

• Focus: Acquisitional Query Processing

4

Overview



• Focus: Acquisitional Query Processing

5

Device Capabilities• “Mica Motes”

– 8bit, 4Mhz processor» Roughly a PC AT

– 40kbit CSMA radio– 4KB RAM, 128K flash, 512K EEPROM– Sensor board expansion slot

» Standard board has light & temperature sensors, accelerometer, magnetometer, microphone, & buzzer

• Other more powerful platforms exist– E.g. UCLA WINS, Medusa, MIT Cricket, Princeton

Zebranet

• Trend towards smaller devices– “Smart Dust” – Kris Pister, et al.

6

Sensor Net Sample Apps

Habitat Monitoring. Storm petrels on great duck island, microclimates on James Reserve.

Traditional monitoring apparatus.

Earthquake monitoring in shake-test sites.

Vehicle detection: sensors dropped from UAV along a road, collect data about passing vehicles, relay data back to UAV.

7

Metric: Communication• Lifetime from one

pair of AA batteries – 2-3 days at full

power– 6 months at 2%

duty cycle

• Communication dominates cost– < few mS to

compute– 30mS to send

message

• Our metric: communication!

Time v. Current Draw During Query Processing

0

5

10

15

20

0 0.5 1 1.5 2 2.5 3Time (s)

Cu

rre

nt

(mA

) Snoozing

Processing

Processingand Listening

Transmitting

8

TinyOS

• Operating system from David Culler’s group at Berkeley

• C-like programming environment

• Provides messaging layer, abstractions for major hardware components– Split phase highly asynchronous, interrupt-

driven programming model

Hill, Szewczyk, Woo, Culler, & Pister. “Systems Architecture Directions for Networked Sensors.” ASPLOS 2000. See http://webs.cs.berkeley.edu/tos

9

Communication In Sensor Nets

• Radio communication has high link-level losses– typically about 20%

@ 5m

• Ad-hoc neighbor discovery

• Tree-based routing

A

B C

D

FE

10

Overview



• Acquisitional Query Processing

11

Declarative Queries for Sensor Networks

• Examples:SELECT nodeid, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s

1 EpocEpochh

NodeiNodeidd

LightLight TemTempp

AccelAccel SounSoundd

0 1 455 x x x

0 2 389 x x x

1 1 422 x x x

1 2 405 x x x

Sensors

12

Aggregation Queries

SELECT roomNo, AVG(sound)

FROM sensors

GROUP BY roomNo

HAVING AVG(sound) > 200

EPOCH DURATION 10sRooms w/ sound > 200

3

2SELECT AVG(sound)

FROM sensors

EPOCH DURATION 10s

Epoch AVG(sound)

0 440

1 445

Epoch

roomNo

AVG(sound)

0 1 360

0 2 520

1 1 370

1 2 520

13

Declarative Benefits In Sensor Networks

• Reduces Complexity– Locations as predicates– Operations are over groups

• Fault Tolerance– Control of when & where

• Data independence– Control of representation & storage

» Indices, join location, materialization points, RAM vs EEPROM, etc.

14

Computing In Sensor Nets Is Hard

– Limited power» Power-based query optimization» Routing indices» In-network computation» Exploitation of operator semantics (TAG, OSDI 2002)

– Lossy, low-bandwidth communication» In-network computation» Caching, retransmission, etc.» Data prioritization

– Remote, zero administration, long lived deployments» Ad-hoc (fault-tolerant) networking» Lifetime estimation» Semantically aware routing (TAG)

– Limited processing capabilities, storage

15

Overview



• Focus: Tiny Aggregation• The Next Step

16

TinyDB

• A distributed query processor for networks of Mica motes– Available today!

• Goal: Eliminate the need to write C code for most TinyOS users

• Features– Declarative queries– Temporal + spatial operations– Multihop routing– In-network storage

17

A

B C

D

FE

TinyDB @ 10000 Ft

Query

{D,E,F}

{B,D,E,F}

{A,B,C,D,E,F}

Written in SQL-Like Language With Extensions For :

•Sample rate

•Offline delivery

•Temporal Aggregation

(Almost) All Queries are Continuous and Periodic

18

TinyDB Demo

19

Applications + Early Adopters

• Some demo apps:– Network monitoring– Vehicle tracking

• “Real” future deployments:– Environmental monitoring @ GDI (and

James Reserve?)– Generic Sensor Kit– Building Monitoring

Demo!

20

Benefit of TinyDB

SELECT COUNT(light)SAMPLE PERIOD 4s

• Cost metric = #msgs• 16 nodes• 150 Epochs• In-net loss rates: 5%• Centralized loss: 15%• Network depth: 4

In-Network vs. Out of Network Aggregation

0

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

In-Network External

# M

essages

21

TinyDB Architecture (Per node)

Radio

Stack

Schema

TinyAllloc

TupleRouter

AggOperatorSelOperator

Network

TupleRouter:•Fetches readings (for ready queries)•Builds tuples•Applies operators•Deliver results (up tree)

AggOperator:•Combines local & neighbor readings

SelOperator:•Filters readings

Schema:•“Catalog” of commands & attributes (more later)

TinyAlloc:•Reusable memory allocator!

~10,000 Lines C Code~5,000 Lines Java~3200 Bytes RAM (w/ 768 byte heap)~58 kB compiled code(3x larger than 2nd largest TinyOS Program)

22

Catalog & Schema Manager

• Attribute & Command IF – Components register attributes and

commands they support» Commands implemented via wiring» Attributes fetched via accessor command

– Catalog API allows local and remote queries over known attributes / commands.

• Sensor specific metadata– Power to access attributes– Time to access attribute

23

Overview



• Acquisitional Query Processing

24

Acquisitional Query Processing

• Cynical question: what’s really different about sensor networks?

–Low Power?

–Lots of Nodes?

–Limited Processing Capabilities?

Laptops!

Distributed DBs!

Moore’s Law!

Being a little bit facetious, but…

25

Answer

• Long running queries on physically embedded devices that control when and and with what frequency data is collected!

• Versus traditional systems where data is provided a priori

26

ACQP: What’s Different?

• How does the user control acquisition?– Rates or lifetimes.– Event-based triggers

• How should the query be processed?– Sampling as an operator!– Events as joins

• Which nodes have relevant data?– Semantic Routing Tree

» Nodes that are queried together route together

• Which samples should be transmitted?– Pick most “valuable”?

27

ACQP






28

Lifetime Queries

• Lifetime vs. sample rate

SELECT …LIFETIME 30 days

SELECT …LIFETIME 10 daysMIN SAMPLE INTERVAL 1s

Implies not all data is xmitted

29

Processing Lifetimes

• At root– Compute SAMPLE PERIOD that satisfies

lifetime– If it exceeds MIN SAMPLE PERIOD (MSP), use

MSP and compute transmission rate

• At other nodes– Use root’s values or slower

• Root = bottleneck– Multiple roots?– Adaptive roots?

30

Lifetime Based Queries

31

Event Based Processing

• ACQP – want to initiate queries in response to events

ON EVENT bird-enter(…)

SELECT b.cnt+1

FROM birds AS b

OUTPUT INTO b

ONCE

In-network storage

Subject to optimization

CREATE BUFFER birds(uint16 cnt)

SIZE 1

32

More Events

ON EVENT bird_detect(loc) AS bd

SELECT AVG(s.light), AVG(s.temp)

FROM sensors AS s

WHERE dist(bd.loc,s.loc) < 10m

SAMPLE PERIOD 1s for 10

[Coming soon!]

33

ACQP






34

Operator Ordering: Interleave Sampling + Selection

SELECT light, magFROM sensorsWHERE pred1(mag)AND pred2(light)SAMPLE INTERVAL 1s

• E(mag) >> E(light)•1500 uJ vs. 90 uJ

• Possible orderings:

At 1 sample / sec, total power savings could be as much as 4mW, same as the processor!

2. Sample light Apply pred2Sample magApply pred1

1. Sample light Sample magApply pred1Apply pred2

3. Sample mag Apply pred1Sample lightApply pred2

35

Optimizing in ACQP• Sampling = “expensive predicate” • Some subtleties:

– Which predicate to “charge”?– Can’t operate without samples

• Solution: – Treat sampling as a separate task– Build a partial order– Solve for cheapest schedule using series-parallel

scheduling algorithm» Monma & Sidney, 1979, as in Ibaraki & Kameda, TODS,

1984, or Hellerstein, TODS, 1998.

36

Exemplary Aggregate Pushdown

SELECT WINMAX(light,8s,8s)FROM sensorsWHERE mag > xSAMPLE INTERVAL 1s

Unless > x is very selective, correct ordering is:Sample lightCheck if it’s the maximumIf it is:

Sample magCheck predicateIf satisfied, update maximum

37

Event-Join Duality

ON EVENT E(nodeid)SELECT aFROM sensors AS sWHERE s.nodeid = e.nodeidSAMPLE INTERVAL d FOR k

• Problem: multiple outstanding queries (lots of samples)

t

d

d d/2

•High event frequency → Use Rewrite

•Rewrite problem: phase alignment!

•Solution: subsampleSELECT s.aFROM sensors AS s,

events AS eWHERE s.nodeid = e.nodeid

AND e.type = EAND s.time – e.time < kAND s.time > e.timeSAMPLE INTERVAL d

38

ACQP






39

Attribute Driven Topology Selection

• Observation: internal queries often over local area– Or some other subset of the network

»E.g. regions with light value in [10,20]

• Idea: build topology for those queries based on values of range-selected attributes– For range queries– Relatively static trees

»Maintenance Cost

40

Attribute Driven Query Propagation

1 2 3

4

[1,10]

[7,15]

[20,40]

SELECT …

WHERE a > 5 AND a < 12

Precomputed intervals = Semantic Routing Tree (SRT)

41

Attribute Driven Parent Selection

1 2 3

4

[1,10] [7,15] [20,40]

[3,6]

[3,6] [1,10] = [3,6]

[3,7] [7,15] = ø

[3,7] [20,40] = ø

Even without intervals, expect that sending to parent with closest value will help

42

Simulation Result

~14% Reduction

43

ACQP






44

Adaptive Rate ControlSample Rate vs. Delivery Rate

0

1

2

3

4

5

6

7

8

0 2 4 6 8 10 12 14 16Samples Per Second (Per Mote)

Ag

gre

ga

te D

eliv

ery

Ra

te

(Pa

ck

ets

/Se

co

nd

)

1 mote

4 motes

4 motes, adaptive

Adaptive = 2x Successful Xmissions

45

Delta Encoding

• Must pick most valuable data• How?

– Domain Dependent»E.g., largest, average, shape preserving,

frequency preserving, most samples, etc.

• Simple idea for time-series: order biggest-change-first

46

Choosing Data To Send

• Score each item• Send largest score

– Out of order -> Priority Queue

• Discard / aggregate when full

[1,2] [2,6] [3,15] [4,1] [5,4][5,2.5]t=4t=5

Time Value

47


t=1

[1,2]

Time vs. Value

0

2

4

6

8

10

12

14

16

1 2 3 4

Time

Val

ue

48


t=5

[2,6] [3,15] [4,1]

[1,2]

|2-6| = 4

|2-15| = 13

|2-4| = 2

Time vs. Value

0

2

4

6

8

10

12

14

16

1 2 3 4

Time

Val

ue

49


t=5

[2,6]

[3,15]

[4,1]

[1,2]

Time vs. Value

0

2

4

6

8

10

12

14

16

1 2 3 4

Time

Val

ue

|2-6| = 4 |15-4| = 11

50


t=5

[2,6]

[3,15] [4,1][1,2]

Time vs. Value

0

2

4

6

8

10

12

14

16

1 2 3 4

Time

Val

ue

51


t=5

[2,6] [3,15] [4,1][1,2]

Time vs. Value

0

2

4

6

8

10

12

14

16

1 2 3 4

Time

Val

ue

52

Delta + Adaptivity

• 8 element queue

• 4 motes transmitting different signals

• 8 samples /sec / mote

53

Aggregate Prioritization

• Insight: Shared channel enables nodes to hear neighbor values

• Suppress values that won’t affect aggregate– E.g., MAX– Applies to all exemplary, monotonic

aggregates e.g. top/bottom N, MIN, MAX, etc.

54

Hypothesis Testing• Insight: Guess from root can be used for

suppression– E.g. ‘MIN < 50’– Works for monotonic & exemplary aggregates

»Also summary, if imprecision allowed

• How is hypothesis computed?– Blind or statistically informed guess– Observation over network subset

55

Simulation: Aggregate Prioritization

•Uniform Value Distribution•Dense Packing •Ideal Communication

Messages/ Epoch vs. Network Diameter(SELECT MAX(attr), R(attr) = [0,100])

0

500

1000

1500

2000

2500

3000

10 20 30 40 50

Network Diameter

Messages /

Epoch

No Guess

Guess = 50

Guess = 90

Snooping

56

ACQP Summary

• Lifetime & event based queries– User preferences for when data is acquired

• Optimizations for– Order of sampling– Events vs. joins

• Semantic Routing Tree– Query dissemination

• Runtime prioritization– Adaptive rate control– Which samples to send

57

Fun Stuff

• Temporal aggregates

• Sophisticated or sensor network specific aggregates– Mapping– Tracking– Wavelets

58

Temporal Aggregates

• TAG was about “spatial” aggregates– Inter-node, at the same time

• Want to be able to aggregate across time as well

• Two types:– Windowed: AGG(size,slide,attr)

– Decaying: AGG(comb_func, attr)– Demo!

… R1 R2 R3 R4 R5 R6 …

slide =2 size =4

59

Isobar Finding

60

Summary

• Declarative queries are the right interface for data collection in sensor nets!– Easier, faster, & more robust

• Acquisitional Query Processing – Framework for addresses many new issues

that arise in sensor networks, e.g.» Order of sampling and selection» Languages, indices, approximations that give user

control over which data enters the systemTinyDB Release Available - http://telegraph.cs.berkeley.edu/tinydbhttp://telegraph.cs.berkeley.edu/tinydb

61

Questions?

62


63

Count vs. Time

64

Simulation Environment

• Chose to simulate to allow 1000’s of nodes and control of topology, connectivity, loss

• Java-based simulation & visualization for validating algorithms, collecting data.

• Coarse grained event based simulation– Sensors arranged on a grid, radio connectivity by

Euclidian distance– Communication model

» Lossless: All neighbors hear all messages» Lossy: Messages lost with probability that increases with

distance» Symmetric links» No collisions, hidden terminals, etc.

65

Simulation Result

Total Bytes Xmitted vs. Aggregation Function

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

EXTERNAL MAX AVERAGE COUNT MEDIANAggregation Function

Tota

l Byt

es X

mitt

ed

Simulation Results

2500 Nodes

50x50 Grid

Depth = ~10

Neighbors = ~20

Some aggregates require dramatically more state!

66

Taxonomy of Aggregates• TAG insight: classify aggregates according to

various functional properties– Yields a general set of optimizations that can

automatically be appliedProperty Examples Affects

Partial State MEDIAN : unbounded, MAX : 1 record

Effectiveness of TAG

Duplicate Sensitivity

MIN : dup. insensitive,AVG : dup. sensitive

Routing Redundancy

Exemplary vs. Summary

MAX : exemplaryCOUNT: summary

Applicability of Sampling, Effect of Loss

Monotonic COUNT : monotonicAVG : non-monotonic

Hypothesis Testing, Snooping

67

Optimization: Channel Sharing (“Snooping”)

• Insight: Shared channel enables optimizations• Suppress messages that won’t affect aggregate

– E.g., in a MAX query, sensor with value v hears a neighbor with value ≥ v, so it doesn’t report

– Applies to all exemplary, monotonic aggregates

• Learn about query advertisements it missed– If a sensor shows up in a new environment, it can learn

about queries by looking at neighbors messages.» Root doesn’t have to explicitly rebroadcast query!

68

Optimization: Hypothesis Testing

• Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value.– E.g. Tell all the nodes that the MIN is definitely < 50; nodes

with value ≥ 50 need not participate.– Works for monotonic & exemplary– Can be applied to summary aggregates also if imprecision is

allowed

• How is hypothesis computed?– Blind guess– Statistically informed guess– Observation over first few levels of tree / rounds of aggregate

69

Experiment: Hypothesis Testing

Uniform Value Distribution, Dense Packing, Ideal Communication

Messages/ Epoch vs. Network Diameter(SELECT MAX(attr), R(attr) = [0,100])

0

500

1000

1500

2000

2500

3000

10 20 30 40 50

Network Diameter

Messages /

Epoch

No GuessGuess = 50Guess = 90Snooping

70

Optimization: Use Multiple Parents

• For duplicate insensitive aggregates• Or aggregates that can be expressed as a linear

combination of parts– Send (part of) aggregate to all parents– Decreases variance

» Dramatically, when there are lots of parents

A

B C

A

B C

A

B C

1

A

B C

A

B C

1/2 1/2

No splitting:

E(count) = c * p

Var(count) = c2 * p * (1-p)

With Splitting:

E(count) = 2 * c/2 * p

Var(count) = 2 * (c/2)2 * p * (1-p)

71

Multiple Parents Results

• Interestingly, this technique is much better than previous analysis predicted!

• Losses aren’t independent!

• Instead of focusing data on a few critical links, spreads data over many links

Benefit of Result Splitting (COUNT query)

0

200

400

600

800

1000

1200

1400

(2500 nodes, lossy radio model, 6 parents per node)

Avg

. C

OU

NT Splitting

No Splitting

Critical Link!

No Splitting With Splitting

72

TAG Summary

• In-network query processing a big win for many aggregate functions

• By exploiting general functional properties of operators, optimizations are possible– Requires new aggregates to be tagged with

their properties

• Up next: non-aggregate query processing optimizations – a flavor of things to come!

73

TAG

• In-network processing of aggregates– Aggregates are common operation– Reduces costs depending on type of

aggregates– Focus on “spatial aggregation” (Versus

“temporal aggregation”)

• Exploitation of operator, functional semantics

Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002.

74

Aggregation Framework

• As in extensible databases, we support any aggregation function conforming to:

Aggn={fmerge, finit, fevaluate}

Fmerge{<a1>,<a2>} <a12>

finit{a0} <a0>

Fevaluate{<a1>} aggregate value

(Merge associative, commutative!)Example: Average

AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>

AVGinit{v} <v,1>

AVGevaluate{<S1, C1>} S1/C1

Partial State Record (PSR)

Just like parallel database systems – e.g. Bubba!

75

Query Propagation Review

A

B C

D

FE

SELECT AVG(light)…

76

Pipelined Aggregates• After query propagates, during each epoch:

– Each sensor samples local sensors once– Combines them with PSRs from children– Outputs PSR representing aggregate state

in the previous epoch.• After (d-1) epochs, PSR for the whole tree

output at root– d = Depth of the routing tree– If desired, partial state from top k levels

could be output in kth epoch• To avoid combining PSRs from different

epochs, sensors must cache values from children

1

2 3

4

5Value from 5 produced at

time t arrives at 1 at time

(t+3)

Value from 2 produced at


(t+1)

77

Illustration: Pipelined Aggregation

1

2 3

4

5

SELECT COUNT(*) FROM sensors

Depth = d

78


1 2 3 4 5

1 1 1 1 1 1

1

2 3

4

5

1

1

11

1

Sensor #

Ep

och

#

Epoch 1SELECT COUNT(*) FROM sensors

79


1 2 3 4 5

1 1 1 1 1 1

2 3 1 2 2 1

1

2 3

4

5

1

2

21

3

Sensor #

Ep

och

#


80


1 2 3 4 5

1 1 1 1 1 1

2 3 1 2 2 1

3 4 1 3 2 1

1

2 3

4

5

1

2

31

4

Sensor #

Ep

och

#


81


1 2 3 4 5

1 1 1 1 1 1

2 3 1 2 2 1

3 4 1 3 2 1

4 5 1 3 2 1

1

2 3

4

5

1

2

31

5

Sensor #

Ep

och

#


82


1 2 3 4 5

1 1 1 1 1 1

2 3 1 2 2 1

3 4 1 3 2 1

4 5 1 3 2 1

5 5 1 3 2 1

1

2 3

4

5

1

2

31

5

Sensor #

Ep

och

#


83

Grouping

• If query is grouped, sensors apply predicate on each epoch

• PSRs tagged with group• When a PSR (with group) is received:

– If it belongs to a stored group, merge with existing PSR

– If not, just store it

• At the end of each epoch, transmit one PSR per group

84

Group Eviction• Problem: Number of groups in any one iteration

may exceed available storage on sensor• Solution: Evict! (Partial Preaggregation*)

– Choose one or more groups to forward up tree– Rely on nodes further up tree, or root, to recombine

groups properly– What policy to choose?

» Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch.

» Experiments suggest:• Policy matters very little• Evicting as many groups as will fit into a single message is

good

* Per-Åke Larson. Data Reduction by Partial Preaggregation. ICDE 2002.

85

TAG Advantages

• In network processing reduces communication– Important for power and contention

• Continuous stream of results– In the absence of faults, will converge to

right answer

• Lots of optimizations– Based on shared radio channel– Semantics of operators

86

Simulation Screenshot

87

Hypothesis Testing For Average

• AVERAGE: each node suppresses readings within some ∆ of a approximate average µ*. – Parents assume children who don’t report

have value µ*

• Computed average cannot be off by more than ∆.

88

TinyAlloc• Handle Based Compacting Memory Allocator• For Catalog, Queries

Free Bitmap

Heap

Master Pointer Table

Handle h;

call MemAlloc.alloc(&h,10);

…

(*h)[0] = “Sam”;

call MemAlloc.lock(h);

tweakString(*h);

call MemAlloc.unlock(h);

call MemAlloc.free(h);

User Program

Free Bitmap

Heap


Free Bitmap

Heap


Free Bitmap

Heap


Compaction

89

Schema

• Attribute & Command IF – At INIT(), components register attributes

and commands they support» Commands implemented via wiring» Attributes fetched via accessor command

– Catalog API allows local and remote queries over known attributes / commands.

• Demo of adding an attribute, executing a command.

90

Q1: Expressiveness

• Simple data collection satisfies most users

• How much of what people want to do is just simple aggregates?– Anecdotally, most of it– EE people want filters + simple statistics

(unless they can have signal processing)

• However, we’d like to satisfy everyone!

91

Query Language

• New Features:– Joins– Event-based triggers

»Via extensible catalog

– In network & nested queries– Split-phase (offline) delivery

»Via buffers

92

Sample Query 1Bird counter:CREATE BUFFER birds(uint16 cnt)

SIZE 1

ON EVENT bird-enter(…)SELECT b.cnt+1FROM birds AS bOUTPUT INTO bONCE

93

Sample Query 2

Birds that entered and left within time t of each other:

ON EVENT bird-leave AND bird-enter WITHIN tSELECT bird-leave.time, bird-leave.nestWHERE bird-leave.nest = bird-enter.nestONCE

94

Sample Query 3

Delta compression:

SELECT light FROM buf, sensorsWHERE |s.light – buf.light| > tOUTPUT INTO bufSAMPLE PERIOD 1s

95

Sample Query 4Offline Delivery + Event ChainingCREATE BUFFER equake_data( uint16 loc, uint16 xAccel, uint16

yAccel)SIZE 1000PARTITION BY NODE

SELECT xAccel, yAccelFROM SENSORSWHERE xAccel > t OR yAccel > tSIGNAL shake_start(…)SAMPLE PERIOD 1s

ON EVENT shake_start(…)SELECT loc, xAccel, yAccelFROM sensorsOUTPUT INTO BUFFER equake_data(loc, xAccel, yAccel)SAMPLE PERIOD 10ms

96


• Enables internal and chained actions• Language Semantics

– Events are inter-node– Buffers can be global

• Implementation plan– Events and buffers must be local– Since n-to-n communication not (well)

supported

• Next: operator expressiveness

97

Attribute Driven Topology Selection

• Observation: internal queries often over local area*– Or some other subset of the network

»E.g. regions with light value in [10,20]

• Idea: build topology for those queries based on values of range-selected attributes– Requires range attributes, connectivity to be

relatively static* Heideman et. Al, Building Efficient Wireless Sensor Networks With Low Level Naming. SOSP, 2001.

98

Attribute Driven Query Propagation

1 2 3

4

[1,10]

[7,15]

[20,40]

SELECT …

WHERE a > 5 AND a < 12

Precomputed intervals == “Query Dissemination Index”

99

Attribute Driven Parent Selection

1 2 3

4

[1,10] [7,15] [20,40]

[3,6]

[3,6] [1,10] = [3,6]

[3,7] [7,15] = ø

[3,7] [20,40] = ø

Even without intervals, expect that sending to parent with closest value will help

100

Hot off the press…Nodes Vi s i t ed vs . Range Quer y Si ze f or

Di ff er ent I ndex Pol i ci es

0

50

100

150

200

250

300

350

400

450

0.001 0.05 0.1 0.2 0.5 1Quer y Si ze as % of Val ue Range

( Random val ue di st r i but i on, 20x20 gr i d, i deal connect i vi t y t o ( 8) nei ghbor s)

Nu

mb

er

o

f

No

de

s

Vi

si

te

d

(4

00

=

M

ax)

B es t Case (Expec ted)C loses t ParentNeares t V alueSnooping

101

Grouping• GROUP BY expr

– expr is an expression over one or more attributes» Evaluation of expr yields a group number» Each reading is a member of exactly one group

Example: SELECT max(light) FROM sensorsGROUP BY TRUNC(temp/10)

Sensor ID Light Temp Group1 45 25 22 27 28 23 66 34 34 68 37 3

Group max(light)

2 45

3 68

Result:

102

Having

• HAVING preds– preds filters out groups that do not satisfy

predicate– versus WHERE, which filters out tuples that

do not satisfy predicate– Example:

SELECT max(temp) FROM sensors GROUP BY light HAVING max(temp) < 100

Yields all groups with temperature under 100

103

Group Eviction• Problem: Number of groups in any one iteration may

exceed available storage on sensor• Solution: Evict!

– Choose one or more groups to forward up tree– Rely on nodes further up tree, or root, to recombine groups

properly– What policy to choose?

» Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch.

» Experiments suggest:• Policy matters very little• Evicting as many groups as will fit into a single message is good

104

Experiment: Basic TAG

Dense Packing, Ideal Communication

Bytes / Epoch vs. Network Diameter

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

10 20 30 40 50

Network Diameter

Avg

. B

yte

s /

Ep

och

COUNTMAXAVERAGEMEDIANEXTERNALDISTINCT

105

Experiment: Hypothesis Testing

Uniform Value Distribution, Dense Packing, Ideal Communication

Messages/ Epoch vs. Network Diameter

0

500

1000

1500

2000

2500

3000

10 20 30 40 50

Network Diameter

Messag

es /

Ep

och

No GuessGuess = 50Guess = 90Snooping

106

Experiment: Effects of Loss

Percent Error From Single Loss vs. Network Diameter

0

0.5

1

1.5

2

2.5

3

3.5

10 20 30 40 50

Network Diameter

Perc

en

t Err

or

Fro

m S

ing

le L

oss

AVERAGECOUNTMAXMEDIAN

107

Experiment: Benefit of Cache

Percentage of Network I nvolved vs. Network Diameter

0

0.2

0.4

0.6

0.8

1

1.2

10 20 30 40 50

Network Diameter

% N

etw

ork

No Cache5 Rounds Cache9 Rounds Cache15 Rounds Cache

108

Pipelined Aggregates• After query propagates, during each epoch:

– Each sensor samples local sensors once– Combines them with PSRs from children– Outputs PSR representing aggregate state

in the previous epoch.• After (d-1) epochs, PSR for the whole tree

output at root– d = Depth of the routing tree– If desired, partial state from top k levels

could be output in kth epoch• To avoid combining PSRs from different

epochs, sensors must cache values from children

1

2 3

4

5



(t+3)



(t+1)

109

Pipelining Example

1

2

43

5

SID Epoch Agg.

SID Epoch Agg.

SID Epoch Agg.

110

Pipelining Example

1

2

43

5

SID Epoch Agg.

2 0 1

4 0 1

SID Epoch Agg.

1 0 1

SID Epoch Agg.

3 0 1

5 0 1

Epoch 0

<5,0,1>

<4,0,1>

111

Pipelining Example

1

2

43

5

SID Epoch Agg.

2 0 1

4 0 1

2 1 1

4 1 1

3 0 2

SID Epoch Agg.

1 0 1

1 1 1

2 0 2

SID Epoch Agg.

3 0 1

5 0 1

3 1 1

5 1 1

Epoch 1

<5,1,1>

<4,1,1><3,0,2>

<2,0,2>

112

Pipelining Example

1

2

43

5

SID Epoch Agg.

2 0 1

4 0 1

2 1 1

4 1 1

3 0 2

2 2 1

4 2 1

3 1 2

SID Epoch Agg.

1 0 1

1 1 1

2 0 2

1 2 1

2 0 4

SID Epoch Agg.3 0 15 0 13 1 15 1 13 2 15 2 1

Epoch 2

<5,2,1>

<4,2,1><3,1,2>

<2,0,4>

<1,0,3>

113

Pipelining Example

1

2

43

5

SID Epoch

Agg.

2 0 1

4 0 1

2 1 1

4 1 1

3 0 2

2 2 1

4 2 1

3 1 2

SID Epoch Agg.

1 0 1

1 1 1

2 0 2

1 2 1

2 0 4

SID Epoch Agg.3 0 15 0 13 1 15 1 13 2 15 2 1

Epoch 3

<5,3,1>

<4,3,1><3,2,2>

<2,1,4>

<1,0,5>

114

Pipelining Example

1

2

43

5

Epoch 4

<5,4,1>

<4,4,1><3,3,2>

<2,2,4>

<1,1,5>

115

Our Stream Semantics• One stream, ‘sensors’• We control data rates• Joins between that stream and buffers are

allowed• Joins are always landmark, forward in time, one

tuple at a time– Result of queries over ‘sensors’ either a single tuple

(at time of query) or a stream

• Easy to interface to more sophisticated systems• Temporal aggregates enable fancy window

operations

116

Formal Spec.

ON EVENT <event> [<boolop> <event>... WITHIN <window>] [SELECT {<expr>|agg(<expr>)|temporalagg(<expr>)} FROM [sensors | <buffer> | events]] [WHERE {<pred>}] [GROUP BY {<expr>}] [HAVING {<pred>}] [ACTION [<command> [WHERE <pred>] |

BUFFER <bufname> SIGNAL <event>({<params>}) | (SELECT ... ) [INTO BUFFER <bufname>]]]

[SAMPLE PERIOD <seconds> [FOR <nrounds>] [INTERPOLATE <expr>] [COMBINE {temporal_agg(<expr>)}] |

ONCE]

117

Buffer Commands

[AT <pred>:]CREATE [<type>] BUFFER <name> ({<type>})PARTITION BY [<expr>]SIZE [<ntuples>,<nseconds>][AS SELECT ...

[SAMPLE PERIOD <seconds>]]

DROP BUFFER <name>

1 Querying Sensor Networks Sam Madden UC Berkeley December 13 th, New England Database Seminar.

Documents

Transcript of 1 Querying Sensor Networks Sam Madden UC Berkeley December 13 th, New England Database Seminar.