Lineage Tracing in Data Warehouses Yingwei Cui Stanford University Database Group.
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Lineage Tracing in Data Warehouses Yingwei Cui Stanford University Database Group.
Lineage Tracing in DataLineage Tracing in Data WarehousesWarehouses
Yingwei Cui
Stanford University Database Group
2
Motivation: Data WarehousingMotivation: Data Warehousing
Data Warehouse
Source 1 Source 2 Source 3
Lucrative Fields
Databases $8800K Theory $320K
Networks $800K
StudentsEnrollmentsCourses
Wow?!
Databases $8800K
3
Courses Enrollments Students
Oh, I see...
Source 1 Source 2 Source 3
Lineage Tracer
Data Warehouse
Lucrative Fields
Database 1800 Theory $320K
Networks $800K Databases $8800K
CS145 Ted CS154 Joe
CS244 BobCS145 Ann CS245 Jane
……
Bob MS $1K Jane Web $5K
Ann BS $1K
Joe BS $1KTed Web $5K … … …
CS145 Databases CS154 Theory
CS244 Networks CS245 Databases
4
The Data Lineage ProblemThe Data Lineage Problem
Data warehouses integrate data from multiple sources for analysis and mining
Data lineageData lineage: given data item o in the warehouse, which data items in the sources were used to derive o?
Sometimes called “drill-through” in industry
5
ChallengesChallenges
Warehouse of relational views over relational sources– What is a good formal definition for lineage?– How do we trace data lineage for arbitrary views?– How do we make it efficient?
Warehouse defined by graph of data transformations– No fixed, well-defined relational operators– Large transformation sequences and graphs
6
ContributionsContributions Thesis contributions
– Basics of lineage tracing for relational views [TODS’00]
– Lineage tracing system prototype [ICDE’00 demo]
– Performance study and optimizations [ICDE’00, DMDW’00]
– Lineage tracing for general data transformations [VLDB’01]
– View update for deletions using data lineage [TechReport’01]
Other contributions (joint with others)– Data warehousing performance issue [VLDB’00]
– Data management for wireless networks [Infocom’98, Globecom’97]
7
Outline of TalkOutline of Talk
Part 1: Lineage tracing for relational views
Part 2: Lineage tracing for general data transformations
Part 3: View update for deletions using data lineage (time permitting)
8
Part 1: Part 1: Lineage Tracing for Relational ViewsLineage Tracing for Relational Views
Declarative definition of data lineage
Lineage tracing algorithms
Using auxiliary views for efficient lineage tracing
Experimental results (small sample)
9
Views We ConsiderViews We Consider
Relational algebra
Arbitrary use of aggregation
Set semantics
Also in thesis– Set operators – Bag semantics
R S T
V
10
V
V = ( (R S)) Y,sum(Z) X >Z
R
S
X Y Z3 2a
bb
88
06
Y sum
a 2b 6
X Y Z3 2a8 08 98 6
bbb
X Y3 a
Y Z
2a0b9b6b
8 b
Y,sum(Z)X >Z
T U
b 6b8 0b8 6
8 0
8 6
b
b0b
6b
8 b
Simple Lineage ExampleSimple Lineage Example
11
Lineage for Relational OperatorsLineage for Relational Operators
Unary relational operators
op
R
R* t
Lineage of t according to op is the maximal subset R* R such that
(1) op(R*) = {t}(2) t* R*: op({t*})
12
Example 1
R
X Y Z3 2a
bb
88
06
X Y Z3 2a8 08 98 6
bbb
X >Z
Lineage of t according to op is the maximal subset R* R such that
(1) (1) opop((RR*) = {*) = {tt}}(2) (2) tt* * RR*: *: opop({({tt*}) *})
Lineage for Relational OperatorsLineage for Relational Operators
b8 68 6b
13
Example 2
R
X Y Z3 2a
bb
88
06
Y sum
a 2b 6
Y,sum(Z)
Lineage of t according to op is the maximalmaximal subset R* R such that
(1) op(R*) = {t}(2) t* R*: op({t*})
Lineage for Relational OperatorsLineage for Relational Operators
b 6b8 0b8 6
14
N-ary relational operators (e.g., )
Lineage for Relational OperatorsLineage for Relational Operators
Lineage of t according to op is the maximalmaximal subsets Ri* Ri for i = 1..n such that
(1) op(R1*, …, Rn*) = {t}(2) ti* Ri*: op(R1, …, {ti*}, …, Rn)
op
R1*
*R2
R2
R1
15
Lineage for Relational ViewsLineage for Relational Views
Lineage of a tuple set is union of lineage of each tuple in the set
Lineage for views is defined recursively
opop1 2
VU
R1
R2
t
U*
*
*
R1
R2
Lineage of t is R1*, R2*
16
Lineage TracingLineage Tracing
Convert view into aa segmented normal form segmented normal form
E1 … En Each segment
Generate one tracing query tracing query for each segment
Apply tracing queries recursively
– # non-top + 1
Lineage result is unaffected by normalization and Lineage result is unaffected by normalization and segment-level tracingsegment-level tracing
17
Tracing Query for One SegmentTracing Query for One Segment
V Y sum
a 2b 6
R
S
TQ = Split ( (R S))X >Z Y=b R,S
Y,sum(Z)
X >Z
b
6
b
X Y3 a8
Y Z
2a09b
b
R*={(8,b)}, S*={(b,0),(b,6)}
b 0
6b
b8
b 6
V = ( (R S)) X >ZY,sum(Z)
18
Recursive Tracing ProcedureRecursive Tracing Procedure
V W avg
p 4q 6
U
R
S
X Y3 a
Y Z
2a0b9b6b
8 b
T
Y sum
a 2b 6
Y Wa p
pq
bb
TQ = Split ( (U T))W=q1 U,T TQ = Split ( (R S))X >Z Y=b2 R,S
b 6
qb
8 b
0b
6b
q 6
R*={(8,b)}, S*={(b,0),(b,6)}, T*={(b,q)}
8 b
0b
6bqb
V = (( (R S)) T)) W, avg(sum) Y,sum(Z) X >Z
19
Making It EfficientMaking It Efficient
Source accesses are usually expensive or impossible
Need some intermediate results for lineage tracing
Store auxiliary viewsauxiliary views at the warehouse– Reduce or eliminate source accesses– Reduce recomputation of intermediate results
20
Auxiliary ViewsAuxiliary Views
There are many possible auxiliary views
For single-segment views– Identified 10 possible auxiliary view schemes– Studied performance tradeoffs
For arbitrary views– Hard optimization problem– Exhaustive and heuristic algorithms– Performance study
R1 … Rn
21
+ Always improve lineage tracing
– Must be maintained when sources change
+ Can also help with maintenance of original user views
Auxiliary Views: Performance TradeoffsAuxiliary Views: Performance Tradeoffs
22
Auxiliary View Schemes for Auxiliary View Schemes for Single-Segment ViewsSingle-Segment Views
Parameters:- 3-way SPJ view- sources: 10MB each- disk: 1Mbps- network: 50kbps- 1000 operations- q/u ratio = 4
Measurements:- tracing time- maintenance time
23
Auxiliary View Selection Auxiliary View Selection Algorithms for Arbitrary ViewsAlgorithms for Arbitrary Views
24
Part 2: Part 2: Transformation GraphsTransformation Graphs
Lineage definition
Tracing algorithms
Combining transformations for lineage tracing
Experimental results (tiny sample) Source 1
Data Warehouse
Source 2 Source 3
T6
T4 T5
T3
T2
T1
25
T1
T3 T4 T6 T7T5
id cust date prod-list1 A 2/8/99 1(10),2(10)2 C 4/5/99 2(5),3(10) 3 D 6/1/99 1(20),2(10) 4 B 8/6/99 1(10),3(5)5 D 10/8/99 1(5),3(10) 6 B 12/1/99 2(10),3(10)
id name price valid1 imac 1200 10/1/98- 2 vaio 2400 6/1/98-9/1/99 2 vaio 1800 9/2/99- 3 palm 500 2/1/98-7/1/98 3 palm 400 7/2/98-9/1/99 3 palm 300 9/2/99-
name avg3 Q4 palm 2K 6Kpalmpalm 2K 6K 2K 6K
3 palm 400 7/2/98-9/1/993 palm 400 7/2/98-9/1/99 3 palm 300 9/2/99-3 palm 300 9/2/99-
2 C 4/5/99 2(5),3(10)2 C 4/5/99 2(5),3(10)
4 B 8/6/994 B 8/6/99 1(10),3(5)1(10),3(5)5 D 10/8/99 1(5),3(10)5 D 10/8/99 1(5),3(10) 6 B 12/1/99 2(10),3(10)6 B 12/1/99 2(10),3(10)
SalesJump
Order
Product T2
Transformation Example Transformation Example
selection
“join”split pivot projectionselectionprojection
26
Lineage for General TransformationsLineage for General Transformations
A transformationtransformation can be an arbitrary program
T
select … from … where … main(int argc, char** argv) {…} sed “s/string1/string2/g” …
??
– One extreme: relational operators– Another extreme: we know nothing about T– Middle ground: based on transformation properties
27
Transformation PropertiesTransformation Properties
Transformation classes
Additional properties– Transformation subclasses– Schema information– Provided inverse or tracing procedure
28
i II: T(I) = T({i})
dispatcher
T*(o) = {i | oT({i})}
Transformation ClassesTransformation Classes
29
Dispatcher ExampleDispatcher Example
id cust date prod-list1 A 2/8/99 1(10),2(10)2 C 4/5/99 2(5),3(10) 3 D 6/1/99 1(20),2(10) 4 B 8/6/99 1(10),3(5)5 D 10/8/99 1(5),3(10) 6 B 12/1/99 2(10),3(10)
Orderid cust date pid quant1 A 2/8/99 1 101 A 2/8/99 2 10 : : : 5 D 10/8/99 1 55 D 10/8/99 3 10 6 B 12/1/99 2 106 B 12/1/99 3 10
T1
O1
5 D 10/8/99 1(5),3(10)
5 D 10/8/99 1 55 D 10/8/99 3 10 5 D 10/8/99 3 10
5 D 10/8/99 1(5),3(10)
30
i II: T(I) = T({i})
dispatcher
I and T(I)={o1…on}: unique partition I1..In of I s.t. T(Ik) = {ok}
aggregator
T*(ok) = IkT*(o) = {i | oT({i})}
Transformation ClassesTransformation Classes
31
Aggregator ExampleAggregator Example
T4name Q1 Q2 Q3 Q4imac 12K 24K 12K 6K vaio 24K 12K 24K 18Kpalm 0K 4K 2K 6K
O3
O4
oid name date price quant1 imac 2/8/99 1200 101 vaio 2/8/99 2400 10 2 vaio 4/5/99 2400 5
3 imac 6/1/99 1200 203 vaio 6/1/99 2400 10 4 imac 8/6/99 1200 104 palm 8/6/99 400 55 imac 10/8/99 1200 55 palm 10/8/99 300 10 6 vaio 12/1/99 1800 106 palm 12/1/99 300 10
2 palm 4/5/99 400 10 2 palm 4/5/99 400 10
4 palm 8/6/99 400 5
6 palm 12/1/99 300 10
palm 0K 4K 2K 6K 5 palm 10/8/99 300 10
palm 0K 4K 2K 6K
2 palm 4/5/99 400 10
4 palm 8/6/99 400 5
6 palm 12/1/99 300 10
5 palm 10/8/99 300 10
32
i II: T(I) = T({i})
dispatcher
I and T(I)={o1…on}: unique partition I1..In of I s.t. T(Ik) = {ok}
aggregator black-box
All others
T*(ok) = Ik T*(o) = IT*(o) = {i | oT({i})}
Transformation ClassesTransformation Classes
33
Most transformations are dispatchers, aggregators, or their compositions
A transformation can be both dispatcher and aggregator– Lineage definitions are equivalent
Transformations can be relational operators– Lineage definitions same as relational definitions
Transformation ClassesTransformation Classes
34
Transformation PropertiesTransformation Properties
Transformation classes
Additional properties– Transformation subclasses– Schema information– Provided inverse or tracing procedure
35
Transformation SubclassesTransformation Subclasses
Permit more efficient lineage tracing
Filter is a special dispatcher– Each input data item produces itself or nothing
Context-free aggregator– Whether two input data items are in the same partition
is independent of other items
Key-preserving aggregator– Any subset of an input partition always produces the
same output key
36
Tracing Example: AggregatorsTracing Example: Aggregators Consider T(I) = {o1…on}
Tracing the lineage of o for aggregator– Partition input I into I1…In such that T(Ik) = {ok}– Return Ik such that T(Ik) = {o}
Tracing the lineage of o for context-free aggregator– Partition input I into I1…In such that |T(Ik)| = 1– Return Ik such that T(Ik) = {o}
37
Schema InformationSchema Information
Input schema A=(A1…An) and key Akey
Output schema B=(B1…Bn) and key Bkey
Schema mappings: f(A) B and A g(B)
Transformations with special schema mappings– Forward key-map: f(A) Bkey – Backward key-map: Akey g(B) – Backward total-map: A g(B)
38
Tracing Example: Forward Key-MapsTracing Example: Forward Key-Maps
T4
O3 O4name Q1 Q2 Q3 Q4imac 12K 24K 12K 6K vaio 24K 12K 24K 18Kpalm 0K 4K 2K 6K palm 0K 4K 2K 6K
oid name date price quant1 imac 2/8/99 1200 101 vaio 2/8/99 2400 10 2 vaio 4/5/99 2400 5
3 imac 6/1/99 1200 203 vaio 6/1/99 2400 10 4 imac 8/6/99 1200 104 palm 8/6/99 400 55 imac 10/8/99 1200 55 palm 10/8/99 300 10 6 vaio 12/1/99 1800 106 palm 12/1/99 300 10
2 palm 4/5/99 400 10 2 palm 4/5/99 400 10
4 palm 8/6/99 400 5
6 palm 12/1/99 300 10
5 palm 10/8/99 300 10
39
Other PropertiesOther Properties
Provided Tracing Procedure
Provided Transformation Inverse T –1
– If T is an aggregator, then o’s lineage is T –1({o}) – Not always true for dispatchers or black-boxes
40
Tracing ProceduresTracing Procedures
Property Procedure # T Calls # Accesses
dispatcher TraceDS O(|I|) O(|I|)
aggregator TraceAG O(2|I|) O(2|I|)
black-box return I; 0 O(|I|)
filter return o; 0 0
context-free aggr. TraceCF O(|I|2) O(|I|2)
key-preserving aggr. TraceKP O(|I|) O(|I|)
forward key-map TraceFM 0 O(|I|)
backward key-map TraceBM 0 O(|I|)
backward total-map TraceTM 0 0
Provided tracing-proc. provided ? ?
41
Property HierarchyProperty HierarchyANY
provided tracing-proc.
or inverse
black-boxaggregator
dispatchercontext-free aggr.
key-preserving aggr.
filter
forward key-mapbackward key-map
total-map
42
Summary of Our Approach for Summary of Our Approach for One TransformationOne Transformation
Properties are provided with transformations– Specified by the transformation author – Declared in prepackaged transformations– Derived using recent techniques [Clio01, RB01]
The best property of a transformation is selected based on the hierarchy
The tracing procedure using the best property is called at tracing time
Indexing techniques
43
Transformation SequencesTransformation Sequences
Naive algorithm traces backwards one transformation at a time– Need all intermediate results– Poor performance for long sequences
T1 T2 T3 TnI O
44
T1 T2 T3 TnI O
T’ TnI O
Combine transformations and trace as one– Reduces number of intermediate results– By combining judiciously
Reduces tracing cost Doesn’t lose accuracy
Transformation SequencesTransformation Sequences
45
Overall ApproachOverall Approach
Algorithm for deriving properties of T = T1 T2 from properties of T1 and T2
Coarse-grained cost metric for a tracing sequence based on transformation properties
Greedy algorithm
•
46
Example of Greedy AlgorithmExample of Greedy Algorithm
T4 T6 T7 T5
fkmap(2) btmap(1) filter(1) bkmap(2)
blkbox(5)
blkbox(5) bkmap(2)
bkmap(2)fkmap(2) btmap(1)
fkmap(2)T4’ T6 T7
bkmap(2)filter(1)
bkmap(2)T6’
fkmap(2)T4’
47
Multiple-Input ExampleMultiple-Input Example
T3
id cust date pid quant1 A 2/8/99 1 101 A 2/8/99 2 10 : : : 5 D 10/8/99 1 55 D 10/8/99 3 10 6 B 12/1/99 2 106 B 12/1/99 3 10
id name price valid1 imac 1200 10/1/98- 2 vaio 2400 6/1/98-9/1/99 2 vaio 1800 9/2/99- 3 palm 400 7/2/98-9/1/99 3 palm 300 9/2/99-
oid name date price quant1 imac 2/8/99 1200 101 vaio 2/8/99 2400 10 : : : 5 imac 10/8/99 1200 55 palm 10/8/99 300 10 6 vaio 12/1/99 1800 106 palm 12/1/99 300 10
5 palm 10/8/99 300 10
5 D 10/8/99 3 10
3 palm 300 9/2/99-
dispatcher
dispatcher
O3
O1
O2
48
Transformation GraphsTransformation Graphs
I1
I2O
Definition time – Specify properties of each transformation in graph
49
Transformation GraphsTransformation Graphs
Definition time – Specify properties of each transformation in graph– Consider each path as a transformation sequence– Combine transformations in each sequence
I1
I2O
50
Transformation GraphsTransformation Graphs
Load time – Save intermediate results and build indices as desired
Tracing time – Trace lineage through each sequence – Combine results
Definition time
I1
I2O
51
Example RevisitedExample Revisited
T1
T3 T4 T6
ProductSalesJumpT7T5
Order
T2
bkmapbkmap
dispatcher fkmap filterbtmap
filter
dispatcher
T1
T3 T4 T6
ProductSalesJumpT7T5
Order
T2
bkmapfkmapbkmap
dispatcher
52
Experimental ResultsExperimental Results
Transformation graph based on a complex TPC-D query (Q12)
53
Part 3: Part 3: View Update Using Data LineageView Update Using Data Lineage
View update: translating updates on views to updates on base tables
Obvious connection to lineage in case of view deletions
Fresh approach with improved results
54
View Update Translations: View Update Translations: Valid and Exact Valid and Exact
V
t
R1 R2 Rn
……
55
V
t
R1 R2 Rn
……
View Update Translations: View Update Translations: Valid and Exact Valid and Exact
56
V
t
R1 R2 Rn
……
View Update Translations: View Update Translations: Valid and Exact Valid and Exact
57
Our AlgorithmOur Algorithm
Uses lineage to:– Find an exact translation whenever one exists
(in linear time for many cases)– Find a “good” translation when no exact translation exists
Fully automatic
Previous approaches– Don’t always find an exact translation– Often require user input– Consider restricted classes of views
58
Related WorkRelated Work
Schema-level lineage tracing (annotation-based)
[BB99, HQGW93, RS98]
Drill-down or drill-through on data cubes [Gray95]
“Weak inverse” for transformations [WS97]
Warehouse load resumption [LGMW00]
Data cleaning [GFSS+01]
View update [DB82, Mas84, Kel85]
59
ConclusionsConclusions
Data lineage problem in two scenarios– Warehouse defined by relational views– Warehouse defined by general data transformations
For both scenarios, we provide:– Formal lineage definition– Lineage tracing algorithms– Optimization techniques– System prototype and performance study
Use lineage for the view update problem
60
Some Open ProblemsSome Open Problems
Lineage of “missing” view or base tuples
Deriving transformation properties
Combining with annotation-based approach
View update– Translation ambiguity– Base table constraints– Multiple interacting views
62
Lineage ApplicationsLineage Applications
On-line analytical processing (OLAP)
Scientific databases
Sensory and monitoring systems
Data cleaning
Warehouse resumption
Data security
View update
63
Convert view definition into aa segmented normal form segmented normal form
Generate one tracing querytracing query for each ASPJ segment Apply tracing queries top-down through view definition Lineage result is unaffected by normalizationLineage result is unaffected by normalization
R S T
V
W
R S T
V
W
Lineage TracingLineage Tracing
64
V
K1 X1 a
K2 X Z2b4a1b8d
b2
R
S1234
3 c
Y
9b5
X avg
a 4b 6
pqr
V = ( (R S)) X,avg(Z) K1<K2
TQ = Split ( (R S))K1<K2 X=b R,S
3b
b2
3
9b5
q
b 6
Tracing ExampleTracing Example
65
Split Lineage Tables (SLT)Split Lineage Tables (SLT)
V
K1 X1 a
K2 X Z2b4a3b8d
b2
R
S1234
3 c
Y
9b5
X avg
a 4b 6
pqr
K1 X1 a
b2
K2 X Z4a2
Y
1b39b5
R'
S'
Split
pqb2 q
3b39b5
b 6
66
Base Table Projections (BP)Base Table Projections (BP)
V X avg
a 4b 6
R
S K2 X Z2b4a1b8d
1234
8b5
K1 X1 a
b23 c
Ypqr
3b
b2
3
9b5
q
b 6
K1 X1 a
b23 c
K2 X babd
1234
b5
R’
S’
b2
b3
b5
67
Context-Free Aggregator Context-Free Aggregator ExampleExample
T4name Q1 Q2 Q3 Q4imac 12K 24K 12K 6K vaio 24K 12K 24K 18Kpalm 0K 4K 2K 6K
O3
O4
oid name date price quant1 imac 2/8/99 1200 101 vaio 2/8/99 2400 10
3 imac 6/1/99 1200 203 vaio 6/1/99 2400 10 4 imac 8/6/99 1200 104 palm 8/6/99 400 55 imac 10/8/99 1200 55 palm 10/8/99 300 10 6 vaio 12/1/99 1800 106 palm 12/1/99 300 10
2 vaio 4/5/99 2400 52 palm 4/5/99 400 10
1 imac 2/8/99 1200 10
3 imac 6/1/99 1200 20
1 vaio 2/8/99 2400 10 2 vaio 4/5/99 2400 5
3 vaio 6/1/99 2400 10
2 palm 4/5/99 400 10
4 imac 8/6/99 1200 10
5 imac 10/8/99 1200 5
6 vaio 12/1/99 1800 10
4 palm 8/6/99 400 5
5 palm 10/8/99 300 10
6 palm 12/1/99 300 10
palm 0K 4K 2K 6K
2 palm 4/5/99 400 10
4 palm 8/6/99 400 5
5 palm 10/8/99 300 10
6 palm 12/1/99 300 10
68
Tracing Example 1Tracing Example 1
Tracing procedure for context-free aggregators– Partition input I into I1…In such that |T(Ik)| = 1;– Return Ik s.t. T(Ik) = {o};
69
Lineage EquivalenceLineage Equivalence
Lineage of equivalent SPJ views are equivalent
Not for ASPJ views
R
UX Y Z3 2a
bb
88
06
Y sum
a 2b 6
Y,sum(Z)b 6b8 0
b8 6
Lineage of equivalent SPJ views are equivalent
Not for ASPJ views
70
Lineage EquivalenceLineage Equivalence
Lineage of equivalent SPJ views are equivalent
Not for ASPJ views
R
UX Y Z3 2a
bb
88
06
Y sum
a 2b 6
B=0 Y,sum(Z)b 6
b8 6
73
Indices Help!Indices Help!
Conventional index – On input key Akey for a backward key-map with
Akeyg(B)
Functional index– On f(A) for a forward key-map with f(A)Bkey – On T(A) for a dispatcher
Lineage index – Mapping the key of each output data item o to
the keys of input data items in o’s lineage
74
Experimental ResultsExperimental Results
Tracing through an “SP” transformation over TPC-D table PartSupp
75
Tracing Through SequencesTracing Through Sequences
Tracing cost estimation– Divide properties into 5 groups– T’s cost level depends on the group of its best property – Associate a sequence with N[1..5] where N[k] records
the number of transformations with cost level k
Greedy algorithm– Pick a combination that results in the lowest N
76
Lineage Annotation (Appendix)Lineage Annotation (Appendix)
1
2
3
{1}{1,2}
{1,2}
{2,4}{4}
{4}4
{1,2}
{1,2,4}
{4}
T1 T2
T1* T2*
77
Multiple Inputs and OutputsMultiple Inputs and Outputs
Define properties for each input and output
Trace lineage for each input/output pair using single-input single-output tracing procedures
T
I1
I2
Im
...
O1
O2
On
78
View UpdateView Update
Deletions on SPJ view deletions on base database
View tuple deletion request –t and base tuple deletion D
D is a translation for –t if {t} V = V(D) – V(D – D)
Side-effect E = V – {t}; D is exact if E =
D
V’UV
D’UD?
V
79
Relationships to Data LineageRelationships to Data Lineage
t
R1 R2 Rn
…
A
C
ti belongs to t’s exclusive lineage Ri** iff
{t} = ( (R1 …{ti}… Rn))
Intuition: ti contributes only to t
A C
ti Ri belongs to t’s lineage Ri* iff
{t} ( (R1 …{ti}… Rn))A C
For an SPJ view:
81
Relationships to Data LineageRelationships to Data Lineage
Deleting a lineage branch Ri*of t is always a translation for –t
t
R1 R2 Rn
…
A
C
82
Deleting a lineage branch Ri*of t is always a translation for –t
t
R1 R2 Rn
…
A
C
Deleting any subset of t’s exclusive lineage D** never causes side-effect
Relationships to Data LineageRelationships to Data Lineage
83
Deleting a lineage branch Ri*of t is always a translation for –t
t
R1 R2 Rn
…
A
C
If –t has an exact translation D, it must also has an exact translation within t’s lineage
Deleting any subset of t’s exclusive lineage D** never causes side-effect
Relationships to Data LineageRelationships to Data Lineage
84
Translating View Tuple DeletionsTranslating View Tuple Deletions
DELETE(t, V, D)
compute lineage D* and exclusive lineage D**; IF D** is a translation THEN RETURN; IF i s.t. Ri* causes no side-effect THEN RETURN; FOR each subset D of D* DO
IF D is not a translation THEN prune all subsets of D; ELSE IF D causes a side-effect THEN prune all supersets of D; ELSE RETURN;