Post on 15-Jan-2016
1
Probabilistic Data Management for the Digital Home:
The HeisenData Project
Minos Garofalakis (IR Berkeley)
UC Berkeley: Prof. Joe Hellerstein, Prof. Mike Franklin, Daisy Wang, Eirinaios Michelakis
September 22, 2006
2
Stanford UPDB MeetingSeptember 22, 2006
The Home, Year 2020
1000s of sensors (light, temp, sound, motion, location, …)
100s of actuators (locks, switches, heating, water, …)
Masses of data (“Traditional” data plus sensor streams)
Does a lot for you: security, HVAC, energy management & demand-response, entertainment, …
– Use Alice’s motion patterns to activate electrical devices (e.g., water heating)
– Correlate user motions with existing patterns to detect “suspicious” behavior
3
Stanford UPDB MeetingSeptember 22, 2006
Not Just “Pie-in-the-Sky” Fantasy
Many prototypes GaTech, MIT, U Colorado, UT Arlington, Orange, Philips, MSR
Advances in Statistical Machine LearningActivity recognition, DARPA Grand Challenge, image recognition
Advances in sensing/actuationWireless, nano tech, sensor data fusion, real-world applications now
Statistical learning techniques enabling rapid advance
But most efforts are stand-alone, “point” solutions
4
Stanford UPDB MeetingSeptember 22, 2006
Thesis: A “Smart” Home Must
1. Handle uncertainty and correlation (probabilistic reasoning)
P( sensor 2455 fired accurately ) > .8 P( someone in den | behavior of sensors ) > .95 P( Bob in den | Bob’s recent history ) < .05 P( Bob eating dinner | house global state ) > .75 P( Bob is happy | years observing Bob ) > .8
A hierarchy of inferences, from minute to abstract
Recognize, manage, and exploit correlations (spatial, temporal)
5
Stanford UPDB MeetingSeptember 22, 2006
Example App: People Tracking
Motion (M2)
Motion (M1)
RFID
Door
Alice
Bob
Correlated
sensors
Sample event:
“Bob is at the front door” with high confidence
6
Stanford UPDB MeetingSeptember 22, 2006
Thesis: A “Smart” Home Must
2. Share its knowledge across applications
Security, HVAC, entertainment, etc., apps need to
Share all objects (sensors, floor plan, people, …)
Share common models of the world, e.g., When does Bob usually come home on Tue? What’s a typical Sunday like? Who’s in the kitchen now?
7
Stanford UPDB MeetingSeptember 22, 2006
Thesis: A “Smart” Home Must
3. Support both real-time & retrospective reasoning
Example real-time reasoning Fire alarm Intruder alert
Example retrospective reasoning Turn on hot water heater just-in-time Automatically detect and enter “vacation mode”
8
Stanford UPDB MeetingSeptember 22, 2006
Existing Approaches
Uncertainty in DB management systems– Simple uncertainty models
– Independent tuples, only limited correlation modeling
– Attaching probabilities at the wrong granularity
ML and “intelligent environment” app areas– No sharing of data or models among apps
– Hard-wired world models, difficult to code/update
9
Stanford UPDB MeetingSeptember 22, 2006
Existing Approaches (contd.)
All interesting data processing done outside the database! Lose all key benefits of a DBMS (declarative querying,
persistence, optimization, …) No sharing of data/knowledge/abstractions, duplication of effort
time id temp
10am 1 20
10am 2 21
.. .. …
10am 7 29
time id temp
10am 1 20
10am 2 21
.. .. …
10am 7 29
Raw Data TablesRaw Data Tables
Relational DBMSRelational DBMSSensor/RFID streamsSensor/RFID streams(+ metadata, floor plans, …)(+ metadata, floor plans, …)
SELECT *SELECT *FROM RAWDATAFROM RAWDATA
INPUT FILE INPUT FILE
……
OUTPUT FILE OUTPUT FILE
10
Stanford UPDB MeetingSeptember 22, 2006
The HeisenData Project Integrated data-management & probabilistic-
reasoning platform
Push stat learning functions inside the DBMS– Model learning, inference, querying, …– Uncertainty, correlations and probabilistic reasoning as
“first-class citizens”
Provide high-level declarative interface, persistence, optimization, … for– Probabilistic models of the world & inference queries– Object/event hierarchies (w/ basic “out of the box” objects)
HeisenDB Engine: basis for ML app development
11
Stanford UPDB MeetingSeptember 22, 2006
HeisenData Model
(Evidence + Model) define a probability distribution over “possible worlds”
Complete data model
time id temp volt
10am 1 20 2.5
10am 2 21 XXX
.. .. …
10am 7 2.8
T1T1
T3T3
T2T2
T4T4
V1V1
V3V3
Evidence Evidence Table(s)Table(s)
Hierarchical Hierarchical FO Graphical FO Graphical ModelModel
++
time id temp volt
10am 1 20 2.5
10am 2 21 2.7
.. .. …
10am 7 26 2.8
time id temp volt
10am 1 20 2.5
10am 2 21 2.7
.. .. …
10am 7 28 2.8
time id temp volt
10am 1 20 2.5
10am 2 21 2.7
.. .. …
10am 7 26 2.8
Prob=0.4Prob=0.4
Prob=0.3Prob=0.3
Prob=0.3Prob=0.3
““Possible Worlds”Possible Worlds”
Prob (World | Evidence)Prob (World | Evidence)ModelModel
12
Stanford UPDB MeetingSeptember 22, 2006
Probabilistic Graphical Models 101
Nodes = Random Variables (RVs); Edges capture direct correlations
Parameterization = factor table for each clique– “Marginal probability distribution” (in general, “correlation
strengths”)
– Concise representation of multidimensional joint pdf
Probabilistic inference: Conditioning, marginalization, MAP estimation, …
T1T1
T3T3
T2T2
T4T4
T1 T2 P
21 22 0.5
22 23 0.2
.. .. …
24 27 0.1
T1 T3 P
21 23 0.2
22 21 0.3
.. .. …
24 29 0.2
. . . . . .
13
Stanford UPDB MeetingSeptember 22, 2006
Hierarchical FO Graphical Models
Goal: Capture correlations at the right abstraction level Semantic hierarchy of RV entities (GROUP-BYs)
– In general, RVs can be defined as “slices” over the table schema– Probabilistic correlations expressed at a level are quantified over
all descendant RVs– Can also have exceptions/overrides at finer resolutions
Cleaner, more intuitive probabilistic models More opportunities for optimizing probabilistic inference
Temperature(T)
LivingRoom(TL)
T1 T2 T3 T4 T5
Bathroom(TB)
TT VV
T1T1
T3T3
T2T2
T4T4 T5T5
14
Stanford UPDB MeetingSeptember 22, 2006
HeisenData Query Processing
“Possible worlds”: Clean semantics but impractical! Perform all query processing (relational & inference
operators) over evidence + model
Evidence TablesEvidence Tables ++Probabilistic ModelProbabilistic Model
Evidence TablesEvidence Tables ++Result ModelResult Model
Distribution ofDistribution ofPossible WorldsPossible Worlds
Resulting Possible Resulting Possible Worlds DistributionWorlds Distribution
Relational &Relational &Inference QueriesInference Queries
ExpandExpand InferInfer
Relational QueriesRelational Queries(for each world)(for each world)
FAST!!FAST!!
INFEASIBLE – Exponential explosion!!INFEASIBLE – Exponential explosion!!
15
Stanford UPDB MeetingSeptember 22, 2006
Query Processing over HFO Models
Query processing algebra = both traditional relational operators and probabilistic inference – Simple example query: Find most probable sensor readings for
tomorrow, and join them with last week’s averages
– Cost, optimize, process such queries?
– Operate over both HFO factor tables and evidence
– “Open” inference primitives for optimizer (cost, ordering, etc.), access structures, …
Relational operators over HFO models – Output: Model for the possible worlds in the relational result
– Non-trivial – the different granularities of RVs in the model can complicate things even for simple operations
16
Stanford UPDB MeetingSeptember 22, 2006
Challenges: Theoretical & Practical
What is the right language/algebra/interface?– Completeness, soundness
– Expressiveness & ease of use
Query Processing & Optimization– Inference is expensive!
– How to optimize & process probabilistic queries with relational and inference operators?
– How to index/summarize/sketch probabilistic data?
– Physical DB design (indexes, access structs, views, …)?
– CPU Intensive: Exploiting parallelism and many-core
Efficient hierarchical model learning & maintenance
17
Stanford UPDB MeetingSeptember 22, 2006
Summary
HeisenData Engine: A base for “intelligent environment” application development
Handles real-world uncertainty and correlations
Pushes statistical learning tools into a DBMS
Sits between home storage management functions and “intelligent” applications
18
Stanford UPDB MeetingSeptember 22, 2006
Thank you!
minos.garofalakis@intel.com
http://berkeley.intel-research.net/minos/