Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom...
-
Upload
tiffany-melton -
Category
Documents
-
view
226 -
download
1
description
Transcript of Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom...
![Page 1: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/1.jpg)
Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS
Martin TheobaldJennifer Widom
Stanford University
![Page 2: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/2.jpg)
2
Outline
• Motivation and Applications• Working Model for ULDBs• Trio-One System Architecture• Efficient Confidence Computations
![Page 3: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/3.jpg)
3
Motivation
• Many applications involve data that is uncertain
(approximate, probabilistic, inexact, incomplete, imprecise, fuzzy, inaccurate, ...)• Many of the same applications need to track
the lineage of their data
Neither uncertainty nor lineage are supported by conventional Database Management Systems (DBMSs)
Coincidence or Fate?
![Page 4: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/4.jpg)
4
Sample Applications
Deduplication / Entity Resolution• Uncertainty: Match and merge• Lineage: Source records
Information extraction• Uncertainty: Extracted labels and values• Lineage: Original context
Information integration• Uncertainty: Inconsistent information• Lineage: Original sources
![Page 5: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/5.jpg)
5
Sample Applications
Scientific experiments• Uncertainty: Captured (and derived) data• Lineage: Layers of views
Sensor data• Uncertainty: Sensor values, aggregations,
missing readings• Lineage: Original readings, views
![Page 6: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/6.jpg)
6
Claim
The connection between uncertainty and lineage goes deeper than just a shared need by several applications
Coincidence or Fate?
![Page 7: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/7.jpg)
7
Lineage and Uncertainty
Lineage...• Enables simple and consistent representation of
uncertain data• Allows for intuitive and complete working models• Correlates uncertainty in query results with
uncertainty in the input data• Can make confidence computations over
uncertain data more efficient
Applications use lineage to reduce or resolve uncertainty
![Page 8: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/8.jpg)
8
Goal
A new kind of DBMS in which:1. Data2. Uncertainty3. Lineage
are all first-class interrelated concepts
Trio }
![Page 9: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/9.jpg)
9
The Trio Trio
1. Data Model Simplest extension to relational model that’s
sufficiently expressive
2. Query Language Simple extension to SQL with well-defined
semantics and intuitive behavior
3. System A complete open-source DBMS that people
want to use
![Page 10: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/10.jpg)
10
The Present
1. Data Model Uncertainty-Lineage Databases (ULDBs)
2. Query Language TriQL
3. System Trio-One: Layered prototype deployable
on top of any standard relational DBMS
![Page 11: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/11.jpg)
11
Running Example: Crime-Solving
Saw(witness,car) // may be uncertainDrives(person,car) // may be uncertain
Suspects(person) = πperson(Saw ⋈car Drives)
![Page 12: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/12.jpg)
12
Data Model: Uncertainty
An uncertain database represents a set ofpossible instances of data values
• Amy saw either a Honda or a Toyota• Jimmy drives a Toyota, a Mazda, or both• Betty saw an Acura with confidence 0.5 or a
Toyota with confidence 0.3• Hank is a suspect with confidence 0.7
![Page 13: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/13.jpg)
13
Our Model for Uncertainty
1. Alternatives (mutually exclusive)2. ‘?’ (Maybe) Annotations3. Confidences
![Page 14: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/14.jpg)
14
Our Model for Uncertainty
1. Alternatives: uncertainty about data value(s)
2. ‘?’ (Maybe) Annotations3. Confidences
Saw (witness,car)(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy,
Mazda)
witness carAmy { Honda, Toyota,
Mazda }=
Three possibleinstances
![Page 15: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/15.jpg)
15
Six possibleinstances
Our Model for Uncertainty
1. Alternatives2. ‘?’ (Maybe): uncertainty about presence3. Confidences
Saw (witness,car)(Amy, Honda) ∥ (Amy, Toyota) ∥ (Amy,
Mazda)(Betty, Acura)
?
![Page 16: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/16.jpg)
16
Our Model for Uncertainty
1. Alternatives2. ‘?’ (Maybe) Annotations3. Confidences: weighted uncertainty
Saw (witness,car)(Amy, Honda): 0.5 ∥ (Amy,Toyota): 0.3 ∥ (Amy,
Mazda): 0.2(Betty, Acura): 0.6
?
Six possible instances, each with a probability
![Page 17: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/17.jpg)
17
Models for Uncertainty
• Our model (so far) is not especially new• We spent some time exploring the space of
models for uncertainty• Tension between understandability and
expressiveness– Our model is understandable– But it is not complete, or even closed under
common operations
![Page 18: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/18.jpg)
18
Closure and Completeness
Completeness Can represent any finite set of possible
instancesClosure Can represent any result of relational
operators
Note: Completeness Closure
![Page 19: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/19.jpg)
19
Models for Uncertainty• A-Tuples – Uncertainty on attribute values
• (Amy, {Toyota, Mazda})• Not closed
• X-Tuples – Uncertainty on entire tuples• { (Amy, Toyota), (Amy, Mazda) }• Still not closed
• C-Tables – Tuples + Boolean constraints• { (Amy, Toyota, X=1),
(Amy, Mazda, X<>1), (Cathy, Honda, X<>1) }
• Closed and complete!• Understandable?
![Page 20: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/20.jpg)
20
Our Model is Not Closed
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jimmy, Toyota) ∥ (Jimmy,
Mazda)(Billy, Honda) ∥ (Frank, Honda)
(Hank, Honda)
SuspectsJimmy
Billy ∥ FrankHank
Suspects = πperson(Saw ⋈car Drives)
???
Does not correctlycapture possibleinstances in theresult
CANNOT
![Page 21: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/21.jpg)
21
to the Rescue
Lineage (provenance): “where data came from”• Internal lineage – pointers to base data• External lineage – files, URLs, web portals, etc.
In Trio: A Boolean function λ from alternatives to other alternatives (or external sources)
Lineage
![Page 22: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/22.jpg)
22
Example with Lineage
ID Saw (witness,car)11
(Cathy, Honda) ∥ (Cathy, Mazda)
ID Drives (person,car)21
(Jimmy, Toyota) ∥ (Jimmy, Mazda)
22
(Billy, Honda) ∥ (Frank, Honda)
23
(Hank, Honda)
ID Suspects31
Jimmy
32
Billy ∥ Frank
33
Hank
???
Suspects = πperson(Saw ⋈car Drives)λ(31) = (11,2),(21,2)
λ(32,1) = (11,1),(22,1); λ(32,2) = (11,1),(22,2)λ(33) = (11,1), 23
Correctly captures possible instances inthe result
![Page 23: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/23.jpg)
23
Trio Data Model
1. Alternatives (mutually exclusive) 2. ‘?’ (Maybe) Annotations 3. Confidences 4. Lineage
ULDBs are closed and complete
Uncertainty-Lineage Databases (ULDBs)
![Page 24: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/24.jpg)
24
ULDBs: Lineage
Conjunctive lineage sufficient for most operations• Disjunctive lineage for duplicate-elimination• Negative lineage for difference
![Page 25: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/25.jpg)
25
ULDBs: Minimality
A ULDB relation R represents a set of possible
instancesDoes every tuple in R appear in some
possible instance? (no extraneous tuples)
Does every maybe-tuple in R not appear in some possible instance? (no extraneous ‘?’s)
Also
Data-minimality
Lineage-minimality
![Page 26: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/26.jpg)
26
Data Minimality ExamplesExtraneous ‘?’
. . .
10
(Billy, Honda) ∥ (Frank, Honda)
. . .
. . .
40
Billy ∥ Frank
. . .
?λ(40,1)=(10,1); λ(40,2)=(10,2)extraneous
. . .
20
(Billy, Honda)
. . .
?. . .
30
(Frank, Honda)
. . .
?
![Page 27: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/27.jpg)
27
Data Minimality ExamplesExtraneous tuple
(Diane, Mazda) ∥ (Diane, Acura)
Dianeextraneous
(Diane, Mazda)
(Diane, Acura)
?
??
![Page 28: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/28.jpg)
28
ULDBs: Membership Questions
Does a given tuple t appear in some (all) possible instance(s) of R ?
Is a given table T one of (all of) the possible instances of R ?
Polynomial algorithms based on data-minimization
NP-Hard
![Page 29: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/29.jpg)
29
Non-Theorists...
Wake Back Up!
![Page 30: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/30.jpg)
30
Querying ULDBs
• Simple extension to SQL• Formal semantics, intuitive meaning• Query uncertainty, confidences, and
lineage
TriQL
![Page 31: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/31.jpg)
31
Initial TriQL Example
ID Saw (witness,car)11
(Cathy, Honda) ∥ (Cathy, Mazda)
ID Drives (person,car)21
(Jimmy, Toyota) ∥ (Jimmy, Mazda)
22
(Billy, Honda) ∥ (Frank, Honda)
23
(Hank, Honda)SELECT Drives.person INTO SuspectsFROM Saw, DrivesWHERE Saw.car = Drives.car
ID Suspects31
Jimmy
32
Billy ∥ Frank
33
Hank
???
λ(31) = (11,2),(21,2)λ(32,1)=(11,1),(22,1); λ(32,2)=(11,1),(22,2)
λ(33) = (11,1), 23
![Page 32: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/32.jpg)
32
Formal Semantics
Query Q on ULDB D
D
D1, D2, …, Dn
possibleinstances
Q on eachinstance
representationof instances
Q(D1), Q(D2), …, Q(Dn)
D’implementation of Q
operational semanticsD + Result
![Page 33: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/33.jpg)
33
Operational Semantics
Over conventional relational database:For each tuple in cross-product of X1, X2, ..., Xn
1. Evaluate the predicate2. If true, project attr-list to create result tuple3. If INTO clause, insert into table
SELECT attr-list [ INTO table ]FROM X1, X2, ..., XnWHERE predicate
![Page 34: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/34.jpg)
34
Operational Semantics
Over ULDB:For each tuple in cross-product of X1, X2, ..., Xn
1. Create “super tuple” T from all combinations of alternatives
2. Evaluate predicate on each alternative in T ; keep only the true ones
3. Project attr-list on each alternative to create result tuple
4. Details: ‘?’, lineage, confidences
SELECT attr-list [ INTO table ]FROM X1, X2, ..., XnWHERE predicate
![Page 35: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/35.jpg)
35
Operational Semantics: Example
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
![Page 36: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/36.jpg)
36
Operational Semantics: Example
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
![Page 37: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/37.jpg)
37
Operational Semantics: Example
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
![Page 38: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/38.jpg)
38
Operational Semantics: Example
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
![Page 39: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/39.jpg)
39
Operational Semantics: Example
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Hank,Honda) ∥ (Cathy,Mazda,Hank,Honda)
![Page 40: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/40.jpg)
40
Operational Semantics: Example
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Hank,Honda) ∥ (Cathy,Mazda,Hank,Honda)
![Page 41: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/41.jpg)
41
Operational Semantics: Example
(Cathy,Honda,Jim,Mazda)∥(Cathy,Honda,Bill,Mazda)∥(Cathy,Mazda,Jim,Mazda)∥(Cathy,Mazda,Bill,Mazda)
SELECT Drives.personFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
(Cathy,Honda,Hank,Honda) ∥ (Cathy,Mazda,Hank,Honda)
![Page 42: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/42.jpg)
42
Operational Semantics: Example
SELECT Drives.person INTO SuspectsFROM Saw, DrivesWHERE Saw.car = Drives.car
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
SuspectsJim ∥ Bill
Hank??
λ( ) = ...λ( ) = ...
![Page 43: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/43.jpg)
43
Confidences
Confidences supplied with base dataTrio computes confidences on query results
• Default probabilistic interpretation• Can choose to plug in different arithmetic
Saw (witness,car)(Cathy, Honda): 0.6 ∥ (Cathy,
Mazda): 0.4
Drives (person,car)(Jim, Mazda): 0.3 ∥ (Bill, Mazda):
0.6(Hank, Honda) ): 1.0
SuspectsJim: 0.12 ∥ Bill: 0.24
Hank: 0.6??
?
0.3 0.4 0.6
Probabilistic Min
![Page 44: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/44.jpg)
44
TriQL: Querying Confidences
Built-in function: Conf() SELECT Drives.person INTO Suspects FROM Saw, Drives WHERE Saw.car = Drives.car AND Conf(Saw) > 0.5 AND Conf(Drives) > 0.8
SELECT Drives.person INTO Suspects FROM Saw, Drives WHERE Saw.car = Drives.car AND Conf(*) > 0.4
![Page 45: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/45.jpg)
45
TriQL: Querying Lineage
Built-in join predicate: Lineage() SELECT Saw.witness INTO AccusesHank FROM Suspects, Saw WHERE Lineage(Suspects,Saw) AND Suspects.person = ‘Hank’
Also works for transitive lineage: SELECT witness FROM AccusesHank WHERE Lineage(Saw,AccusesHank)
![Page 46: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/46.jpg)
46
Additional Query Constructs
• “Horizontal subqueries”Refer to tuple alternatives as a relation
• Unmerged (allow horizontal duplicates)• Flatten, GroupAlts
• NoLineage, NoConf, NoMaybe
• Query-computed confidences• SQL-like data modification statements
![Page 47: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/47.jpg)
47
Final Example Query
PrimeSuspect (crime#, accuser, suspect)(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy,
Hank)(2, Cathy, Frank) ∥ (2, Betty, Freddy)
person score
Amy 10Betty 15Cathy 5
SuspectsJimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166
Frank: 0.25 ∥ Freddy: 0.75
Credibility
List suspects with conf values based on accuser credibility
![Page 48: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/48.jpg)
48
Final Example Query
PrimeSuspect (crime#, accuser, suspect)(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy,
Hank)(2, Cathy, Frank) ∥ (2, Betty, Freddy)
person score
Amy 10Betty 15Cathy 5
SuspectsJimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166
Frank: 0.25 ∥ Freddy: 0.75
Credibility
SELECT suspect, score/[sum(score)] as confFROM (SELECT suspect, (SELECT score FROM Credibility C WHERE C.person = P.accuser) FROM PrimeSuspect P)
![Page 49: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/49.jpg)
49
Final Example Query
PrimeSuspect (crime#, accuser, suspect)(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy,
Hank)(2, Cathy, Frank) ∥ (2, Betty, Freddy)
person score
Amy 10Betty 15Cathy 5
SuspectsJimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166
Frank: 0.25 ∥ Freddy: 0.75
Credibility
SELECT suspect, score/[sum(score)] as confFROM (SELECT suspect, (SELECT score FROM Credibility C WHERE C.person = P.accuser) FROM PrimeSuspect P)
![Page 50: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/50.jpg)
50
Final Example Query
PrimeSuspect (crime#, accuser, suspect)(1, Amy, Jimmy) ∥ (1, Betty, Billy) ∥ (1, Cathy,
Hank)(2, Cathy, Frank) ∥ (2, Betty, Freddy)
person score
Amy 10Betty 15Cathy 5
SuspectsJimmy: 0.33 ∥ Billy: 0.5 ∥ Hank: 0.166
Frank: 0.25 ∥ Freddy: 0.75
Credibility
SELECT suspect, score/[sum(score)] as confFROM (SELECT suspect, (SELECT score FROM Credibility C WHERE C.person = P.accuser) FROM PrimeSuspect P)
“Scaled as conf”
“Uniform as conf”
…
![Page 51: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/51.jpg)
51
The Trio System
Version 1Entirely on top of a conventional DBMSSurprisingly easy and complete, reasonably
efficient
![Page 52: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/52.jpg)
52
The Trio System: Version 1
Lin:RaidF
r tableaidT
o
R aid xid C
Relational DBMS
create trio table T(A,B)select C into R ...
TrioMetadat
a
Trio APISQL commands
• Result cursors• Traverse lineage
T aid xid A B
![Page 53: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/53.jpg)
53
Relation Encoding
aid xid
conf
person
car
21 1 2 Jim Mazda22 1 2 Bill Mazda23 2 1 Hank Honda
SuspectsJim ∥ Bill
Hank
aid xid conf person
31 1 3 Jim32 1 3 Bill33 2 2 Hank
Saw (witness,car)(Cathy, Honda) ∥ (Cathy,
Mazda)
Drives (person,car)(Jim, Mazda) ∥ (Bill, Mazda)
(Hank, Honda)
aid xid conf witness
car
11 1 2 Cathy Honda12 1 2 Cathy Mazda
??
aidFr
table aidTo
31 Saw 1231 Drives 2132 Saw 1232 Drives 2233 Saw 1133 Drives 23
Saw Drives
SuspectsLin:Suspects
![Page 54: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/54.jpg)
54
Relation Encoding with Confidences
aid xid
conf
person car
21 1 0.3 Jim Mazda22 1 0.6 Bill Mazda23 2 1.0 Hank Honda
aid xid conf person
31 1 NULL Jim32 1 NULL Bill33 2 NULL Hank
Saw (witness,car)(Cathy, Honda): 0.6 ∥ (Cathy,
Mazda): 0.4
Drives (person,car)(Jim, Mazda): 0.3 ∥ (Bill, Mazda):
0.6(Hank, Honda)
aid xid conf witness
car
11 1 0.6 Cathy Honda12 1 0.4 Cathy Mazda
Saw Drives
Suspects
0.12
0.24
0.6
![Page 55: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/55.jpg)
55
Query TranslationPersistent results: Query Q into result table R
1. Run query Q’ to produce “super-result” R Q’ ≈ Q but adds aid/xid’s of source tuples, joins lineage tables for lineage() predicates
2. Group R into alternatives, generate new xid’s3. Materialze lineage data into lin:R4. Compute confidences? (optional)5. Add/update metadata: schemas, confidence info, lineage structure
Transient results: stop at 2, return cursor over new xid’s
![Page 56: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/56.jpg)
56
The Trio System: Version 1
R aid xid C
Relational DBMS
create trio table T(A,B)select C into R ...
TrioMetadat
a
Trio APISQL commands
• Result cursors• Traverse lineage
T aid xid A B
Lin:RaidF
r tableaidT
o
![Page 57: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/57.jpg)
57
The Trio System: Version 1
Trio API and translator(TriQL parser in Python)
TrioMetadat
a
TrioExplorer(GUI client)
Trio Stored
Procedures
EncodedData
TablesLineageTables
Standard SQL
• Data & system columns• A-IDs for alternatives• “Verticalized” through shared X-IDs• Confidences
• One lineage table per derived table• Connecting A-IDs
• Table types• Schema-level lineage structure
• conf()• lineage()
• TriQL queries• DDL commands• Schema browsing• Table browsing• Explore lineage• On-demand confidence computation
Command-lineclient
Relational DBMS
SPI-Plugin for
PostgreSQLReplacing
“Fetch-All”
![Page 58: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/58.jpg)
58
The Trio System: Version 2
Trio API and translator
TrioExplorer(GUI client)Command-line
client
Trio DBMSSpecialized
Trio Data Structures
SpecializedTrio Processing & Optimizing
Standard SQL Direct function calls
![Page 59: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/59.jpg)
59
Efficient Confidence Computation
Previous approach (probabilistic databases)• Each operator computes confidences during
query execution• Only certain (“safe”) query plans allowed
In ULDBs Confidence of alternative A is function of
confidences in λ(A)
Our approach• Decoupled data and confidence computation• Allow any query plan for data computation• Compute confidences on-demand based on
lineage
![Page 60: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/60.jpg)
60
CIDs & Confidence Memoization
Identify set of Closest Independent Descendants (CIDs) for each relation in the schema-level lineage DAG relations with no shared ancestors
Batched confidence computations in SQL• Select unions of distinct base alternatives from
all root-to-leaf lineage paths• With CIDs and memoization enabled
– if confidences are memoized then stop at CIDs – else traverse down to the leafs (base data) and memoize at CIDs
(ongoing work)
![Page 61: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/61.jpg)
61
“Probabilistic TPC-H” Setup
• Horizontal split of TPC-H tables
• Uniform-random number of alternatives [1-10]
• Uniform confidences
PartsuppsLineitemsOrders
O2O1
O L P
OL LP
OLP1.5M
1.0M 0.1M
1.0M 0.5M 0.7M
O3 L2L1 L3 L4 P2P1 P3
O L P
CID(OLP) = {O, L, P}
0.8M6M1.5M
![Page 62: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/62.jpg)
62
Confidence Computation Times
0
500
1,000
1,500
2,000
2,500
3,000
3,500
4,000
4,500
5,000
Exec
utio
n Ti
me
(sec
onds
)
O L P OL LP OLP Total
No CIDs
CIDs
![Page 63: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/63.jpg)
63
Per-Tuple vs. Batching (no CIDs)
![Page 64: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/64.jpg)
64
Memoization & Join Multiplicity
![Page 65: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/65.jpg)
65
Current Topics
System• Full query language• Nice interface• Performance experiments• Demo applications
Algorithms: confidence & lineage computation,
extraneous data, membership questions• Minimize lineage traversal• Confidence memoization• Efficient batch computations
![Page 66: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/66.jpg)
66
Future Directions
Theory, Model, Algorithms• Unlimited opportunities
System• Storage, indexing, partitioning• Statistics and query optimization
More features• Top-k by confidence • Incomplete relations; continuous uncertainty;
correlated uncertainty• External lineage; update lineage; versioning
![Page 67: Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS Martin Theobald Jennifer Widom Stanford University.](https://reader035.fdocuments.net/reader035/viewer/2022062503/5a4d1ad07f8b9ab05997109a/html5/thumbnails/67.jpg)
Search “stanford trio”Current Trio team:
Parag Agrawal, Anish Das Sarma, Raghotham Murthy, Michi Mutsuzaki, Shubha Nabar, Tomoe
Sugihara, Martin Theobald, Jennifer Widom
Special thanks to:Ashok Chandra, Alon Halevy, Jeff Ullman,
Omar Benjelloun
but don’t forgetthe lineage…