In-Network Query Processing
description
Transcript of In-Network Query Processing
In-Network Query Processing
Sam Madden
CS294-19/30/03
Outline
• TinyDB– And demo!
• Aggregate Queries• ACQP• Break• Adaptive Operator Placement• …
Outline
• TinyDB– And demo!
• Aggregate Queries• ACQP• Break• Adaptive Operator Placement• …
Programming Sensor Nets Is Hard
– Months of lifetime required from small batteries– 3-5 days naively; can’t recharge often– Interleave sleep with processing
– Lossy, low-bandwidth, short range communication» Nodes coming and going» Multi-hop
– Remote, zero administration deployments– Highly distributed environment– Limited Development Tools
» Embedded, LEDs for Debugging!
High-Level Abstraction Is Needed!
A Solution: Declarative Queries
• Users specify the data they want– Simple, SQL-like queries– Using predicates, not specific addresses– Our system: TinyDB
• Challenge is to provide:– Expressive & easy-to-use interface– High-level operators
• “Transparent Optimizations” that many programmers would miss– Sensor-net specific techniques
– Power efficient execution framework
TinyDB Demo
TinyOS
Schema
Query Processor
Multihop Network
TinyDB Architecture
Schema:•“Catalog” of commands & attributes
Filterlight >
400get (‘temp’)
Aggavg(tem
p)
QueriesSELECT AVG(temp) WHERE light > 400
ResultsT:1, AVG: 225T:2, AVG: 250
Tables Samples got(‘temp’)
Name: tempTime to sample: 50 uSCost to sample: 90 uJCalibration Table: 3Units: Deg. FError: ± 5 Deg FGet f : getTempFunc()…
getTempFunc(…)getTempFunc(…)
TinyDBTinyDB
~10,000 Lines Embedded C Code
~5,000 Lines (PC-Side) Java
~3200 Bytes RAM (w/ 768 byte heap)
~58 kB compiled code
(3x larger than 2nd largest TinyOS Program)
Declarative Queries for Sensor Networks
• Examples:SELECT nodeid, nestNo, lightFROM sensorsWHERE light > 400EPOCH DURATION 1s
1EpocEpoc
hhNodeiNodei
ddnestNnestN
ooLightLight
0 1 17 455
0 2 25 389
1 1 17 422
1 2 25 405
Sensors
“Find the sensors in bright nests.”
Aggregation Queries
Epoch region CNT(…) AVG(…)
0 North 3 360
0 South 3 520
1 North 3 370
1 South 3 520
“Count the number occupied nests in each loud region of the island.”
SELECT region, CNT(occupied) AVG(sound)
FROM sensors
GROUP BY region
HAVING AVG(sound) > 200
EPOCH DURATION 10s
3
Regions w/ AVG(sound) > 200
SELECT AVG(sound)
FROM sensors
EPOCH DURATION 10s
2
Benefits of Declarative Queries
• Specification of “whole-network” behavior• Simple, safe• Complex behavior via multiple queries, app
logic• Optimizable
– Exploit (non-obvious) interactions– E.g.:
• ACQP operator ordering, Adaptive join operator placement, Lifetime selection, Topology selection
• Versus other approaches, e.g., Diffusion• Black box ‘filter’ operators• Intanagonwiwat , “Directed Diffusion”, Mobicomm 2000
Outline
• TinyDB– And demo!
• Aggregate Queries• ACQP• Break• Adaptive Operator Placement• …
Tiny Aggregation (TAG)
• Not in today’s reading• In-network processing of aggregates
– Common data analysis operation• Aka gather operation or reduction in || programming
– Communication reducing• Operator dependent benefit
• Exploit query semantics to improve efficiency!
Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.
Query Propagation Via Tree-Based Routing
• Tree-based routing– Used in:
• Query delivery • Data collection
– Topology selection is important; e.g.
• Krishnamachari, DEBS 2002, Intanagonwiwat, ICDCS 2002, Heidemann, SOSP 2001
• LEACH/SPIN, Heinzelman et al. MOBICOM 99
• SIGMOD 2003– Continuous process
• Mitigates failures
A
B C
D
FE
Q:SELECT …
Q Q
Q
Q
Q
Q
Q
Q QQ
R:{…}
R:{…}
R:{…}
R:{…} R:{…}
Basic Aggregation
• In each epoch:– Each node samples local sensors once– Generates partial state record (PSR)
• local readings • readings from children
– Outputs PSR during assigned comm. interval
• Communication scheduling for power reduction
• At end of epoch, PSR for whole network output at root
• New result on each successive epoch
• Extras:– Predicate-based partitioning via GROUP BY
1
2 3
4
5
Illustration: Aggregation
1 2 3 4 5
4 1
3
2
1
4
1
2 3
4
5
1
Sensor #
<-
Tim
e
SELECT COUNT(*) FROM sensors
Illustration: Aggregation
1 2 3 4 5
4 1
3 2
2
1
4
1
2 3
4
5
2
Sensor #
SELECT COUNT(*) FROM sensors
<-
Tim
e
Illustration: Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1
4
1
2 3
4
5
31
Sensor #
SELECT COUNT(*) FROM sensors
<-
Tim
e
Illustration: Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4
1
2 3
4
5
5
Sensor #
SELECT COUNT(*) FROM sensors
<-
Tim
e
Illustration: Aggregation
1 2 3 4 5
4 1
3 2
2 1 3
1 5
4 1
1
2 3
4
5
1
Sensor #
SELECT COUNT(*) FROM sensors
<-
Tim
e
Aggregation Framework
• As in extensible databases, TAG supports any aggregation function conforming to:
Aggn={finit, fmerge, fevaluate}
Finit {a0} <a0>
Fmerge {<a1>,<a2>} <a12>
Fevaluate {<a1>} aggregate value
Example: AverageAVGinit {v} <v,1>
AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2>
AVGevaluate{<S, C>} S/C
Partial State Record (PSR)
Restriction: Merge associative, commutative
Types of Aggregates
• SQL supports MIN, MAX, SUM, COUNT, AVERAGE
• Any function over a set can be computed via TAG
• In network benefit for many operations– E.g. Standard deviation, top/bottom N, spatial
union/intersection, histograms, etc. – Compactness of PSR
Taxonomy of Aggregates
• TAG insight: classify aggregates according to various functional properties– Yields a general set of optimizations that can automatically be
applied
PropertiesPartial State
MonotonicityExemplary vs. SummaryDuplicate Sensitivity
Drives an API!
Partial State
• Growth of PSR vs. number of aggregated values (n) – Algebraic: |PSR| = 1 (e.g. MIN)– Distributive: |PSR| = c (e.g. AVG)– Holistic: |PSR| = n (e.g. MEDIAN)– Unique: |PSR| = d (e.g. COUNT DISTINCT)
• d = # of distinct values– Content Sensitive: |PSR| < n (e.g. HISTOGRAM)
Property Examples AffectsPartial State MEDIAN : unbounded,
MAX : 1 recordEffectiveness of TAG
“Data Cube”, Gray et. al
Benefit of In-Network Processing
Simulation Results2500 Nodes50x50 GridDepth = ~10
Neighbors = ~20
Uniform Dist over [0,100]
Total Bytes Xmitted vs. Aggregation Function
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
EXTERNAL MAX AVERAGE DISTINCT MEDIAN
Aggregation Function
Total Bytes Xmitted
•Aggregate & depth dependent benefit!
HolisticHolisticUniqueUnique
DistributiveDistributiveAlgebraicAlgebraic
Outline
• TinyDB– And demo!
• Aggregate Queries• ACQP• Break• Adaptive Operator Placement• …
Acquisitional Query Processing (ACQP)
Traditional DBMS: processes data already in the system
Acquisitional DBMS: generates the data in the system!
An acquisitional query processor controls • when, • where, • and with what frequency data is collected
Versus traditional systems where data is provided a priori
ACQP: What’s Different?
• Basic Acquisitional Processing– Continuous queries, with rates or lifetimes– Events for asynchronous triggering
• Avoiding Acquisition Through Optimization– Sampling as a query operator
• Choosing Where to Sample via Co-acquisition– Index-like data structures
• Acquiring data from the network– Prioritization, summary, and rate control
Lifetime Queries
• Lifetime vs. sample rateSELECT …EPOCH DURATION 10 s
SELECT …LIFETIME 30 days
• Extra: Allow a MAX SAMPLE PERIOD– Discard some samples– Sampling cheaper than transmitting
(Single Node) Lifetime Prediction
Voltage vs. Time, Measured Vs. ExpectedLifetime Goal = 24 Weeks (4032 Hours. 15 s / sample)
R2 = 0.8455
300
400
500
600
700
800
900
1000
0 500 1000 1500 2000 2500 3000 3500 4000Time (Hours)
Voltage (Raw Units)
Voltage (Expected)Voltage (Measured)Linear Fit
950
970
990
1010
1030
0 100 200 300
ExpectedMeasured
Insufficient Voltage to Operate (V = 350)
SELECT nodeid, lightLIFETIME 24 Weeks
• E(sampling mag) >> E(sampling light)
1500 uJ vs. 90 uJ
Operator Ordering: Interleave Sampling + Selection
SELECT light, magFROM sensorsWHERE pred1(mag)AND pred2(light)EPOCH DURATION 1s
(pred1)
(pred2)
mag
light
(pred1)
(pred2)
mag
light
(pred1)
(pred2)
mag light
Traditional DBMS
ACQP
At 1 sample / sec, total power savings could be as much as 3.5mW Comparable to processor!
Correct orderingCorrect ordering(unless pred1 is (unless pred1 is very very selective selective
and pred2 is not):and pred2 is not):
Cheap
Costly
Exemplary Aggregate Pushdown
SELECT WINMAX(light,8s,8s)FROM sensorsWHERE mag > xEPOCH DURATION 1s
• Novel, general pushdown technique
• Mag sampling is the most expensive operation!
WINMAX
(mag>x)
mag light
Traditional DBMS
light
mag
(mag>x)
WINMAX
(light > MAX)
ACQP
Event Based Processing• Epochs are synchronous
• Might want to issue queries in response to asynchronous events– Avoid unneccessary “polling”
ON EVENT bird-enter(…) SELECT b.cnt+1 FROM birds AS b OUTPUT INTO b ONCE
CREATE TABLE birds(uint16 cnt) SIZE 1 CIRCULAR In-network storage
Placement subject to optimization
Attribute Driven Network Construction
• Goal: co-acquisition -- sensors that sample together route together
• Observation: queries often over limited area– Or some other subset of the network
• E.g. regions with light value in [10,20]
• Idea: build network topology such that like-valued nodes route through each other– For range queries
– Relatively static attributes (e.g. location)
• Maintenance Issues
Tracking Co-Acquisition Via Semantic Routing Trees
•Idea: send range queries only to participating nodes –Parents maintain ranges of descendants
1 2 3
4
a:[1,10] a:[7,15] a:[20,40]
• Precomputed intervals
• Reported by children as they join
SELECT … WHERE a > 5 AND a < 12
Excluded from query broadcast and result
collection!
1 2 3
0
Parent Selection for SRTs
[3,6] [1,10] = [3,6]
[3,6] [7,15] = ø
[3,6] [20,40] = ø
Other selection policies in paper!
[1,10] [7,15] [20,40]
4
[3,6]
• Idea: Node picks parent whose ancestors’ interval most overlap its descendants’ interval
Simulation Result
Active Nodes vs. Query Range
0
50
100
150
200
250
300
350
400
450
0 0.05 0.1 0.2 0.5 1Query Size as % of Value Range
(Uniform value distribution, 20x20 grid, ideal connectivity to (8) neighbors)
# Active Nodes (400 = Max)
Best Case (Expected)
Closest Parent
All Nodes
Outline
• TinyDB– And demo!
• Aggregate Queries• ACQP• Break• Adaptive Operator Placement• …
Outline
• TinyDB– And demo!
• Aggregate Queries• ACQP• Break• Adaptive Operator Placement• …
Adaptive & Decentralized Operator Placement
• IPSN 2003 Paper• Main Idea
– Place operators near data sources
– Greater operator rate Closer placement
• For each operator– Explore candidate
neighbors– Migrate to lower cost
placements– Via extra messages
Rate A
Rate B
Proper placement depends on path lengths and relative rates!
“Adaptivity” in Databases
• Adaptivity : changing query plans on the fly– Typically at the physical level
• Where the plan runs• Ordering of operators• Instantiations of operators, e.g. hash join vs merge join
– Non-traditional• Conventionally, complete plans are built prior to
execution• Using cost estimates (collected from history)
– Important in volatile or long running environments
• Where a priori estimates are unlikely to be good• E.g., sensor networks
Adaptivity for Operator Placement
• Adaptivity comes at a cost– Extra work on each operator, each tuple– In a DBMS, processing per tuple is small
• 100’s of instructions per operator– Unless you have to hit the disk!
• Costs in this case?– Extra communication hurts
• Finding candidate placements (exploration)– Cost advertisements from local node– New costs from candidates
• Moving state (migration)– Joins, windowed aggregates
Do Benefits Justify Costs?
• Not Evaluated!– 3x reduction on messages vs. external– Excluding exploration & migration costs
• Seems somewhat implausible, especially given added complexity– Hard to make migration protocol work
• Depends on ability to reliably quiesce child ops.
• What else could you do?– Static placement
Summary
• Declarative QP – Simplify data collection in sensornets– In-network processing, query optimization for
performance
• Acquisitional QP– Focus on costs associated with sampling data
• New challenge of sensornets, other streaming systems?
• Adaptive Join Placement– In-network optimization– Some benefit, but practicality unclear– Operator pushdown still a good idea
Open Problems
• Many; a few:– In-network storage and operator
placement– Dealing with heterogeneity– Dealing with loss
• Need real implementations of many of these ideas
• See me! ([email protected])
Questions / Discussion
Making TinyDB REALLY Work
• Berkeley Botanical Garden• First “real deployment”• Requirements:
– At least 1 month unattended operation– Support for calibrated environmental
sensors– Multi-hop routing
• What we started with:– Limited power management, no time-
synchronization– Motes crashed hard occasionally– Limited, relatively untested multihop
routing
Power Consumption in Sensornets
• Waking current ~12mA– Fairly evenly spread between sensors, processor,
radio
• Sleeping current 20-100uA• Power consumption dominated by sensing,
reception:– 1s Power up on Mica2Dot sensor board– Most mote apps use “always on” radio
• Completely unstructured communication • Bad for battery life
Why Not Use TDMA?
• CSMA is very flexible: easy for new nodes to join– Reasonably scalable (relative to
Bluetooth)
• CSMA implemented, available– We wanted to build something that
worked
Power Management Approach
Coarse-grained communication scheduling
1
2
3
4
5
Mote ID
time
Epoch (10s -100s of seconds)
2-4s Waking Period
… zzz … … zzz …
Benefits / Drawbacks
• Benefits– Can still use CSMA within waking period
• No reservation required: new nodes can join easily!
– Waking period duration is easily tunable• Depending on network size
• Drawbacks– Longer waking time vs. TDMA?
• Could stagger slots based on tree-depth
– No “guaranteed” slot reservation• Nothing is guaranteed anyway
Challenges
• Time Synchronization• Fault recovery• Joining, starting, and stopping• Parameter Tuning
– Waking period: Hardcode at 4s?
Time Synchronization
• All messages include a 5 byte time stamp indicating system time in ms– Synchronize (e.g. set local system time to
timestamp) with• Any message from parent• Any new query message (even if not from parent)
– Punt on multiple queries– Timestamps written just after preamble is xmitted
• All nodes agree that the waking period begins when (system time % epoch dur = 0)– And lasts for WAKING_PERIOD ms
Wrinkles
• If node hasn’t heard from it’s parent for k epochs– Switch to “always on” mode for 1 epoch
• If waking period ends– Punt on this epoch
• Query dissemination / deletion via always-on basestation and viral propagation– Data messages as advertisements for running queries– Explicit requests for missing queries– Explicit “dead query” messages for stopped queries
• Don’t request the last dead query
Results
• 30 Nodes in the Garden– Lasted 21 days– 10% duty cycle
• Sample period = 30s, waking period = 3s– Next deployment:
• 1-2% duty cycle• Fixes to clock to reduce baseline power• Should last at least 60 days
• Time sync test: 2,500 readings– 5 readings “out of synch”– 15s epoch duration, 3s waking period
Garden Data
36m
33m: 111
32m: 110
30m: 109,108,107
20m: 106,105,104
10m: 103, 102, 101
Temperature vs. Time
8
13
18
23
28
33
7/7/039:40
7/7/0313:41
7/7/0317:43
7/7/0321:45
7/8/031:47
7/8/035:49
7/8/039:51
7/8/0313:53
7/8/0317:55
7/8/0321:57
7/9/031:59
7/9/036:01
7/9/0310:03
Date
Temperature (C)
Humidity vs. Time
35
45
55
65
75
85
95
Rel Humidity (%)
101 104 109 110 111
Data Quality
Sensor Id vs. Number of Results, Summarized by Parent Sensor
0
2000
4000
6000
8000
10000
12000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Sensor Id
Number oF Messages
0 1 2 4 5 6 8 9 10 11 12 13Parent Sensor:
100 % of Results
62%
75%
61%
53%54%
37%51%
57%
38%
52%
35%
55%
3%
22%
48%
42%