In-Network Query Processing

In-Network Query Processing

Sam Madden

CS294-19/30/03

Outline

• TinyDB– And demo!

• Aggregate Queries• ACQP• Break• Adaptive Operator Placement• …

Programming Sensor Nets Is Hard

– Months of lifetime required from small batteries– 3-5 days naively; can’t recharge often– Interleave sleep with processing

– Lossy, low-bandwidth, short range communication» Nodes coming and going» Multi-hop

– Remote, zero administration deployments– Highly distributed environment– Limited Development Tools

» Embedded, LEDs for Debugging!

High-Level Abstraction Is Needed!

A Solution: Declarative Queries

• Users specify the data they want– Simple, SQL-like queries– Using predicates, not specific addresses– Our system: TinyDB

• Challenge is to provide:– Expressive & easy-to-use interface– High-level operators

• “Transparent Optimizations” that many programmers would miss– Sensor-net specific techniques

– Power efficient execution framework

TinyDB Demo

TinyOS

Schema

Query Processor

Multihop Network

TinyDB Architecture

Schema:•“Catalog” of commands & attributes

Filterlight >

400get (‘temp’)

Aggavg(tem

p)

QueriesSELECT AVG(temp) WHERE light > 400

ResultsT:1, AVG: 225T:2, AVG: 250

Tables Samples got(‘temp’)

Name: tempTime to sample: 50 uSCost to sample: 90 uJCalibration Table: 3Units: Deg. FError: ± 5 Deg FGet f : getTempFunc()…

getTempFunc(…)getTempFunc(…)

TinyDBTinyDB

~10,000 Lines Embedded C Code

~5,000 Lines (PC-Side) Java

~3200 Bytes RAM (w/ 768 byte heap)

~58 kB compiled code

(3x larger than 2nd largest TinyOS Program)

Declarative Queries for Sensor Networks

• Examples:SELECT nodeid, nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s

1EpocEpoc

hhNodeiNodei

ddnestNnestN

ooLightLight

0 1 17 455

0 2 25 389

1 1 17 422

1 2 25 405

Sensors

“Find the sensors in bright nests.”

Aggregation Queries

Epoch region CNT(…) AVG(…)

0 North 3 360

0 South 3 520

1 North 3 370

1 South 3 520

“Count the number occupied nests in each loud region of the island.”

SELECT region, CNT(occupied) AVG(sound)

FROM sensors

GROUP BY region

HAVING AVG(sound) > 200

EPOCH DURATION 10s

3

Regions w/ AVG(sound) > 200

SELECT AVG(sound)

FROM sensors

EPOCH DURATION 10s

2

Benefits of Declarative Queries

• Specification of “whole-network” behavior• Simple, safe• Complex behavior via multiple queries, app

logic• Optimizable

– Exploit (non-obvious) interactions– E.g.:

• ACQP operator ordering, Adaptive join operator placement, Lifetime selection, Topology selection

• Versus other approaches, e.g., Diffusion• Black box ‘filter’ operators• Intanagonwiwat , “Directed Diffusion”, Mobicomm 2000

Outline



Tiny Aggregation (TAG)

• Not in today’s reading• In-network processing of aggregates

– Common data analysis operation• Aka gather operation or reduction in || programming

– Communication reducing• Operator dependent benefit

• Exploit query semantics to improve efficiency!

Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.

Query Propagation Via Tree-Based Routing

• Tree-based routing– Used in:

• Query delivery • Data collection

– Topology selection is important; e.g.

• Krishnamachari, DEBS 2002, Intanagonwiwat, ICDCS 2002, Heidemann, SOSP 2001

• LEACH/SPIN, Heinzelman et al. MOBICOM 99

• SIGMOD 2003– Continuous process

• Mitigates failures

A

B C

D

FE

Q:SELECT …

Q Q

Q

QQ

Q

Q

Q

Q

Q QQ

R:{…}

R:{…}

R:{…}

R:{…} R:{…}

Basic Aggregation

• In each epoch:– Each node samples local sensors once– Generates partial state record (PSR)

• local readings • readings from children

– Outputs PSR during assigned comm. interval

• Communication scheduling for power reduction

• At end of epoch, PSR for whole network output at root

• New result on each successive epoch

• Extras:– Predicate-based partitioning via GROUP BY

1

2 3

4

5

Illustration: Aggregation

1 2 3 4 5

4 1

3

2

1

4

1

2 3

4

5

1

Sensor #

<-

Tim

e

SELECT COUNT(*) FROM sensors


1 2 3 4 5

4 1

3 2

2

1

4

1

2 3

4

5

2

Sensor #


<-

Tim

e


1 2 3 4 5

4 1

3 2

2 1 3

1

4

1

2 3

4

5

31

Sensor #


<-

Tim

e


1 2 3 4 5

4 1

3 2

2 1 3

1 5

4

1

2 3

4

5

5

Sensor #


<-

Tim

e


1 2 3 4 5

4 1

3 2

2 1 3

1 5

4 1

1

2 3

4

5

1

Sensor #


<-

Tim

e

Aggregation Framework

• As in extensible databases, TAG supports any aggregation function conforming to:

Aggn={finit, fmerge, fevaluate}

Finit {a0} <a0>

Fmerge {<a1>,<a2>} <a12>

Fevaluate {<a1>} aggregate value

Example: AverageAVGinit {v} <v,1>

AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>

AVGevaluate{<S, C>} S/C

Partial State Record (PSR)

Restriction: Merge associative, commutative

Types of Aggregates

• SQL supports MIN, MAX, SUM, COUNT, AVERAGE

• Any function over a set can be computed via TAG

• In network benefit for many operations– E.g. Standard deviation, top/bottom N, spatial

union/intersection, histograms, etc. – Compactness of PSR

Taxonomy of Aggregates

• TAG insight: classify aggregates according to various functional properties– Yields a general set of optimizations that can automatically be

applied

PropertiesPartial State

MonotonicityExemplary vs. SummaryDuplicate Sensitivity

Drives an API!

Partial State

• Growth of PSR vs. number of aggregated values (n) – Algebraic: |PSR| = 1 (e.g. MIN)– Distributive: |PSR| = c (e.g. AVG)– Holistic: |PSR| = n (e.g. MEDIAN)– Unique: |PSR| = d (e.g. COUNT DISTINCT)

• d = # of distinct values– Content Sensitive: |PSR| < n (e.g. HISTOGRAM)

Property Examples AffectsPartial State MEDIAN : unbounded,

MAX : 1 recordEffectiveness of TAG

“Data Cube”, Gray et. al

Benefit of In-Network Processing

Simulation Results2500 Nodes50x50 GridDepth = ~10

Neighbors = ~20

Uniform Dist over [0,100]

Total Bytes Xmitted vs. Aggregation Function

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

EXTERNAL MAX AVERAGE DISTINCT MEDIAN

Aggregation Function

Total Bytes Xmitted

•Aggregate & depth dependent benefit!

HolisticHolisticUniqueUnique

DistributiveDistributiveAlgebraicAlgebraic

Outline



Acquisitional Query Processing (ACQP)

Traditional DBMS: processes data already in the system

Acquisitional DBMS: generates the data in the system!

An acquisitional query processor controls • when, • where, • and with what frequency data is collected

Versus traditional systems where data is provided a priori

ACQP: What’s Different?

• Basic Acquisitional Processing– Continuous queries, with rates or lifetimes– Events for asynchronous triggering

• Avoiding Acquisition Through Optimization– Sampling as a query operator

• Choosing Where to Sample via Co-acquisition– Index-like data structures

• Acquiring data from the network– Prioritization, summary, and rate control

Lifetime Queries

• Lifetime vs. sample rateSELECT …EPOCH DURATION 10 s

SELECT …LIFETIME 30 days

• Extra: Allow a MAX SAMPLE PERIOD– Discard some samples– Sampling cheaper than transmitting

(Single Node) Lifetime Prediction

Voltage vs. Time, Measured Vs. ExpectedLifetime Goal = 24 Weeks (4032 Hours. 15 s / sample)

R2 = 0.8455

300

400

500

600

700

800

900

1000

0 500 1000 1500 2000 2500 3000 3500 4000Time (Hours)

Voltage (Raw Units)

Voltage (Expected)Voltage (Measured)Linear Fit

950

970

990

1010

1030

0 100 200 300

ExpectedMeasured

Insufficient Voltage to Operate (V = 350)

SELECT nodeid, lightLIFETIME 24 Weeks

• E(sampling mag) >> E(sampling light)

1500 uJ vs. 90 uJ

Operator Ordering: Interleave Sampling + Selection

SELECT light, magFROM sensorsWHERE pred1(mag)AND pred2(light)EPOCH DURATION 1s

(pred1)

(pred2)

mag

light

(pred1)

(pred2)

mag

light

(pred1)

(pred2)

mag light

Traditional DBMS

ACQP

At 1 sample / sec, total power savings could be as much as 3.5mW Comparable to processor!

Correct orderingCorrect ordering(unless pred1 is (unless pred1 is very very selective selective

and pred2 is not):and pred2 is not):

Cheap

Costly

Exemplary Aggregate Pushdown

SELECT WINMAX(light,8s,8s)FROM sensorsWHERE mag > xEPOCH DURATION 1s

• Novel, general pushdown technique

• Mag sampling is the most expensive operation!

WINMAX

(mag>x)

mag light

Traditional DBMS

light

mag

(mag>x)

WINMAX

(light > MAX)

ACQP

Event Based Processing• Epochs are synchronous

• Might want to issue queries in response to asynchronous events– Avoid unneccessary “polling”

ON EVENT bird-enter(…) SELECT b.cnt+1 FROM birds AS b OUTPUT INTO b ONCE

CREATE TABLE birds(uint16 cnt) SIZE 1 CIRCULAR In-network storage

Placement subject to optimization

Attribute Driven Network Construction

• Goal: co-acquisition -- sensors that sample together route together

• Observation: queries often over limited area– Or some other subset of the network

• E.g. regions with light value in [10,20]

• Idea: build network topology such that like-valued nodes route through each other– For range queries

– Relatively static attributes (e.g. location)

• Maintenance Issues

Tracking Co-Acquisition Via Semantic Routing Trees

•Idea: send range queries only to participating nodes –Parents maintain ranges of descendants

1 2 3

4

a:[1,10] a:[7,15] a:[20,40]

• Precomputed intervals

• Reported by children as they join

SELECT … WHERE a > 5 AND a < 12

Excluded from query broadcast and result

collection!

1 2 3

0

Parent Selection for SRTs

[3,6] [1,10] = [3,6]

[3,6] [7,15] = ø

[3,6] [20,40] = ø

Other selection policies in paper!

[1,10] [7,15] [20,40]

4

[3,6]

• Idea: Node picks parent whose ancestors’ interval most overlap its descendants’ interval

Simulation Result

Active Nodes vs. Query Range

0

50

100

150

200

250

300

350

400

450

0 0.05 0.1 0.2 0.5 1Query Size as % of Value Range

(Uniform value distribution, 20x20 grid, ideal connectivity to (8) neighbors)

# Active Nodes (400 = Max)

Best Case (Expected)

Closest Parent

All Nodes

Outline



Adaptive & Decentralized Operator Placement

• IPSN 2003 Paper• Main Idea

– Place operators near data sources

– Greater operator rate Closer placement

• For each operator– Explore candidate

neighbors– Migrate to lower cost

placements– Via extra messages

Rate A

Rate B

Proper placement depends on path lengths and relative rates!

“Adaptivity” in Databases

• Adaptivity : changing query plans on the fly– Typically at the physical level

• Where the plan runs• Ordering of operators• Instantiations of operators, e.g. hash join vs merge join

– Non-traditional• Conventionally, complete plans are built prior to

execution• Using cost estimates (collected from history)

– Important in volatile or long running environments

• Where a priori estimates are unlikely to be good• E.g., sensor networks

Adaptivity for Operator Placement

• Adaptivity comes at a cost– Extra work on each operator, each tuple– In a DBMS, processing per tuple is small

• 100’s of instructions per operator– Unless you have to hit the disk!

• Costs in this case?– Extra communication hurts

• Finding candidate placements (exploration)– Cost advertisements from local node– New costs from candidates

• Moving state (migration)– Joins, windowed aggregates

Do Benefits Justify Costs?

• Not Evaluated!– 3x reduction on messages vs. external– Excluding exploration & migration costs

• Seems somewhat implausible, especially given added complexity– Hard to make migration protocol work

• Depends on ability to reliably quiesce child ops.

• What else could you do?– Static placement

Summary

• Declarative QP – Simplify data collection in sensornets– In-network processing, query optimization for

performance

• Acquisitional QP– Focus on costs associated with sampling data

• New challenge of sensornets, other streaming systems?

• Adaptive Join Placement– In-network optimization– Some benefit, but practicality unclear– Operator pushdown still a good idea

Open Problems

• Many; a few:– In-network storage and operator

placement– Dealing with heterogeneity– Dealing with loss

• Need real implementations of many of these ideas

• See me! ([email protected])

Questions / Discussion

Making TinyDB REALLY Work

• Berkeley Botanical Garden• First “real deployment”• Requirements:

– At least 1 month unattended operation– Support for calibrated environmental

sensors– Multi-hop routing

• What we started with:– Limited power management, no time-

synchronization– Motes crashed hard occasionally– Limited, relatively untested multihop

routing

Power Consumption in Sensornets

• Waking current ~12mA– Fairly evenly spread between sensors, processor,

radio

• Sleeping current 20-100uA• Power consumption dominated by sensing,

reception:– 1s Power up on Mica2Dot sensor board– Most mote apps use “always on” radio

• Completely unstructured communication • Bad for battery life

Why Not Use TDMA?

• CSMA is very flexible: easy for new nodes to join– Reasonably scalable (relative to

Bluetooth)

• CSMA implemented, available– We wanted to build something that

worked

Power Management Approach

Coarse-grained communication scheduling

1

2

3

4

5

Mote ID

time

Epoch (10s -100s of seconds)

2-4s Waking Period

… zzz … … zzz …

Benefits / Drawbacks

• Benefits– Can still use CSMA within waking period

• No reservation required: new nodes can join easily!

– Waking period duration is easily tunable• Depending on network size

• Drawbacks– Longer waking time vs. TDMA?

• Could stagger slots based on tree-depth

– No “guaranteed” slot reservation• Nothing is guaranteed anyway

Challenges

• Time Synchronization• Fault recovery• Joining, starting, and stopping• Parameter Tuning

– Waking period: Hardcode at 4s?

Time Synchronization

• All messages include a 5 byte time stamp indicating system time in ms– Synchronize (e.g. set local system time to

timestamp) with• Any message from parent• Any new query message (even if not from parent)

– Punt on multiple queries– Timestamps written just after preamble is xmitted

• All nodes agree that the waking period begins when (system time % epoch dur = 0)– And lasts for WAKING_PERIOD ms

Wrinkles

• If node hasn’t heard from it’s parent for k epochs– Switch to “always on” mode for 1 epoch

• If waking period ends– Punt on this epoch

• Query dissemination / deletion via always-on basestation and viral propagation– Data messages as advertisements for running queries– Explicit requests for missing queries– Explicit “dead query” messages for stopped queries

• Don’t request the last dead query

Results

• 30 Nodes in the Garden– Lasted 21 days– 10% duty cycle

• Sample period = 30s, waking period = 3s– Next deployment:

• 1-2% duty cycle• Fixes to clock to reduce baseline power• Should last at least 60 days

• Time sync test: 2,500 readings– 5 readings “out of synch”– 15s epoch duration, 3s waking period

Garden Data

36m

33m: 111

32m: 110

30m: 109,108,107

20m: 106,105,104

10m: 103, 102, 101

Temperature vs. Time

8

13

18

23

28

33

7/7/039:40

7/7/0313:41

7/7/0317:43

7/7/0321:45

7/8/031:47

7/8/035:49

7/8/039:51

7/8/0313:53

7/8/0317:55

7/8/0321:57

7/9/031:59

7/9/036:01

7/9/0310:03

Date

Temperature (C)

Humidity vs. Time

35

45

55

65

75

85

95

Rel Humidity (%)

101 104 109 110 111

Data Quality

Sensor Id vs. Number of Results, Summarized by Parent Sensor

0

2000

4000

6000

8000

10000

12000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Sensor Id

Number oF Messages

0 1 2 4 5 6 8 9 10 11 12 13Parent Sensor:

100 % of Results

62%

75%

61%

53%54%

37%51%

57%

38%

52%

35%

55%

3%

22%

48%

42%

In-Network Query Processing

Documents

Transcript of In-Network Query Processing