DATA FUSION
description
Transcript of DATA FUSION
![Page 1: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/1.jpg)
DATA FUSIONResolving Inconsistencies at Schema, Tuple
and Value Level
Naveen RajamoorthyNachiappan Chidambaram
Arunkarthikeyan PalaniswamySriramakrishnan Soundarrajan
![Page 2: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/2.jpg)
To compare different Data Sets
Example:
Shopping Agents
Disaster Management System
Need for Data Fusion
![Page 3: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/3.jpg)
3
Completeness - amount of data (number of attributes and tuples) - achieved by adding more data sources
Conciseness - number of unique objects - number of unique attributes of the objects - achieved by reducing schematic inconsistencies by schema mapping
Correctness - validity of data - achieved by performing duplicate detection and data fusion
GOALS OF DATA INTEGRATION
Data Sources
Schema Mapping
Duplicate Detection
Data Fusion
![Page 4: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/4.jpg)
Fusing data from heterogeneous sources.
All Steps are performed at run-time.
Data Cleaning
Maximum Flexibility
Humboldt Merger(HumMer)
![Page 5: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/5.jpg)
Heterogeneous and Dirty data
Three Steps
1. Schema Matching and Data Transformation
2. Duplicate Detection
3. Data Fusion
Components of Data Fusion
![Page 6: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/6.jpg)
Three Steps in Data Fusion
Resolve inconsistencies at schema level
Resolve inconsistencies at tuple level
Resolve inconsistencies at value level
![Page 7: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/7.jpg)
7
Schema Matching and Data Transformation
![Page 8: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/8.jpg)
Process of resolving schematic heterogeneity.
1. DUMAS Schema Matching Algorithm (Duplicate-based Matching of Schemas )
2. TF IDF Similarity (term frequency–inverse document frequency)
Schema Matching
![Page 9: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/9.jpg)
R A B C D ER1 John Doe M (408)7573339 (408)7573338R2 Joe Smith M (249)3615616 (249)2342366R3 Suzy Klein F (358)2436321 (358)2436321
Example Consider the relation R(A,B,C,D,E) and S(B’,F,E,’G)
S B’ F E’ GS1 Doe Jdoe 408-9182043 XPS2 Deen Jdean 369-3663625 XPS3 Klein suzy 358-2436321 UnixS4 Adams Adams 541-8121164 W2000
![Page 10: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/10.jpg)
ExampleConsider the relation R(A,B,C,D,E) and S(B’,F,E,’G)
R A B C D ER1 John Doe M (408)7573339 (408)7573338R2 Joe Smith M (249)3615616 (249)2342366R3 Suzy Klein F (358)2436321 (358)2436321R4 Sam Adams M (541)8127100 (541)8121164
S B’ F E’ GS1 Doe Jdoe 408-9182043 XPS2 Deen Jdean 369-3663625 XPS3 Klein suzy 358-2436321 UnixS4 Adams Adams 541-8121164 W2000
![Page 11: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/11.jpg)
R3 Suzy Klein F (358)2436321 (358)2436321
Example
S3 Klein Suzy 358-2436321 Unix
R A B C D E
S B’ F E’ G
![Page 12: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/12.jpg)
ExampleR A B C D ER1 John Doe M (408)7573339 (408)7573338R2 Joe Smith M (249)3615616 (249)2342366R3 Suzy Klein F (358)2436321 (358)2436321R4 Sam Adams M (541)8127100 (541)8121164
S B’ F E’ GS1 Doe Jdoe 408-9182043 XPS2 Deen Jdean 369-3663625 XPS3 Klein suzy 358-2436321 UnixS4 Adams Adams 541-8121164 W2000
![Page 13: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/13.jpg)
Overlap of R and S schema
Schema Matching
Attributes in R Attributes in SA ----B B’C ----D ----E E’---- F---- G
![Page 14: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/14.jpg)
Preferred schema
Names of attributes are renamed or determined.
sourceID attribute is added to all tables in the schema.
Transformation
![Page 15: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/15.jpg)
15
Duplicate Detection
![Page 16: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/16.jpg)
16
Source A
Source B
<pub> <Name> Database Systems: The Complete Book</Name> <Authors> <Author> Hector Garcia-Molina</Author> <Author> Jeffrey D. Ullman</Author> <Author> Jennifer D. Widom</Author> </Authors></pub><publication> <title> Database Systems: The Complete Book </title> <author> Molina & Ullman</author> <year> 1990 </year></publication>
EXAMPLE
![Page 17: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/17.jpg)
17
<pub> <Name> Database Systems: The Complete Book</Name> <Authors> <Author> Hector Garcia-Molina</Author> <Author> Jeffrey D. Ullman</Author> <Author> Jennifer D. Widom</Author> </Authors></pub><publication> <title> Database Systems: The Complete Book </title> <Author> Molina & Ullman</Author> <year> 1990 </year></publication>
SCHEMA MAPPING
Source A
Source B
<pub> <title> </title> <Authors> <author> </author> <author> </author> </Authors> <year> </year></pub>
![Page 18: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/18.jpg)
18
<pub> <title> Database Systems: The Complete Book</title> <Author> Hector Garcia-Molina</Author> <Author> Jeffrey D. Ullman</Author> <Author> Jennifer D. Widom</Author><year> 1990 </year>
</pub>
<pub> <title> Database Systems: The Complete Book </title> <Authors> <Author> Hector Garcia-Molina</Author> <Author> Jeffrey D. Ullman</Author> <Author> Jennifer D. Widom</Author> </Authors></pub><pub> <title> Database Systems: The Complete Book</title> <Author> Hector Garcia-Molina</Author> <Author> Jeffrey D. Ullman</Author> <Author> Jennifer D. Widom</Author> <year> 1990 </year></pub>
DATA TRANSFORMATIONSource
A
Source B
XQuery
<pub> <Name> Database Systems: The Complete Book</Name> <Authors> <Author> Hector Garcia-Molina</Author> <Author> Jeffrey D. Ullman</Author> <Author> Jennifer D. Widom</Author> </Authors></pub>
XQuery
![Page 19: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/19.jpg)
19
DUPLICATE DETECTION AND FUSION<pub> <title> Database Systems: The Complete Book </title> <Authors> <Author> Hector Garcia-Molina</Author> <Author> Jeffrey D. Ullman</Author> <Author> Jennifer D. Widom</Author> </Authors></pub><pub> <title> Database Systems: The Complete Book</title> <Author> Hector Garcia-Molina</Author> <Author> Jeffrey D. Ullman</Author> <Author> Jennifer D. Widom</Author> <year> 1990 </year></pub>
<pub> <title> Database Systems: The Complete Book </title> <Authors> <Author> Hector Garcia-Molina</Author> <Author> Jeffrey D. Ullman</Author> <Author> Jennifer D. Widom</Author> </Authors> <year> 1990 </year></pub>
![Page 20: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/20.jpg)
20
Give the correct order in which integration needs to be carried out:
A) Data Transformation -> Schema Mapping -> Duplicate detection ->Fusion
B) Duplicate detection -> Data Transformation -> Schema Mapping -> Fusion
C)Schema Mapping -> Data Transformation -> Duplicate detection ->Fusion
D) Data Transformation -> Schema Mapping -> Fusion -> Duplicate detection
QUESTION
![Page 21: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/21.jpg)
21
Problem◦ Given one or more data sets, find all sets of
objects that represent the same real-world entity. Difficulties
◦ Duplicates are not identical Similarity measures – Levenshtein, Jaccard, etc.
◦ Large volume, cannot compare all pairs Partitioning strategies – Sorted neighborhood,
Blocking, etc.
Duplicate Detection
![Page 22: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/22.jpg)
22
General Strategy Sorted Neighborhood Method
PARTITIONING STRATEGIES
![Page 23: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/23.jpg)
23
Compare each record with every other record and calculate distance measure. Assuming there are n records in database then we need to compute n(n-1)/2 distance measures.
GENERAL STRATEGY
X Y ZStar Wars Lucas 1985Indiana Jones
Lucas 1989
Home Alone
Wright 1991
Starwars George Lucas
1985
Shrek Adamson
2001
Snatch Ritcie 1999
Number of records, n = 6Number of Distance measures to be computed = 10
If there are say, 100000 records,Then, Number of Distance Measures tobe computed = 5*10^8 calculations
EXPENSIVE
![Page 24: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/24.jpg)
24
Using Sorted Neighborhood method we can reduce the number of potential duplicate pairs.
Different fields are identified as key. The database is sorted using this key. After sorting a window of fixed size slides over
the sorted database and duplicate records are identified.
The technique generates O(wN) pairs where w is window size and N is the total number of records in database.
SORTED NEIGHBORHOOD METHOD
![Page 25: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/25.jpg)
25
DUPLICATE DETECTION WITH DESCRIPTIONSCriteria For Attribute Selection:
Attributes that are:
(i) related to the currently considered objectChild elements having a Foreign key constraint over the attributes of the parent table.
(ii) useable by our similarity measureAttribute City corresponding to attribute Zip code cannot be used to calculate similarity measure
(iii) likely to distinguish duplicates from non-duplicates.Attribute for Denomination is unlikely to distinguish duplicate records
![Page 26: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/26.jpg)
26
Description: Consider attributes from other tables that have a foreign key
relationship with the existing tables. For efficiency, only direct child attributes are considered, i.e. no
descendants reached by following more than one reference are discarded.
DUPLICATE DETECTION WITH DESCRIPTIONS
MovieTitleYearDuration
FilmNameDateRating
ActorNameMovie
ActressNameMovie
ActorsNameFilm
Prod-ComNameFilm
Let tables T1 and T2 be the two matched tables, and let {T1,1, . . . , T1,k} and {T2,1, . . . , T2,m} be their respective children tables.
Then, every pair of tables (T1,i, T2,j), 1<=i<=k, i<=j<=mis matched.
Thus Actor(Movie),Actress(Movie) and Actors(Film) can also be used for Duplicate Detection
![Page 27: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/27.jpg)
27
ExampleID Countr
y1 USA2 United
States3 US
ID City Country ID1 Charlotte 12 California 13 Charlotte 24 California 25 Charlotte 36 California 3Table 1
Table 2ID in Table 1 is a foreign key for Country ID in Table 2
From Sim(Country) in Table 1 we understand row 1 and 3 are duplicates (row 1 = row 3)
Now on using the attribute City in child table, Table 2 for Duplicate Detection we cometo the conclusion that row 1 = row 2 = row 3 in table 1.i.e: USA = United States = US
![Page 28: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/28.jpg)
28
Detection From Similarity Measure
Source 1
Source 2
Source1 x Source2
PartitioningSimilarity measure
Sure Duplicat
es
Non-Duplicat
es
Possibile Duplicat
es
sim < θ1
sim > θ 2
θ1>sim<θ2
![Page 29: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/29.jpg)
29
ObjectiveGiven a duplicate, create a single object-representation while resolving conflicting data values.
Simple Example:
Data Fusion
Source 1
Source 2
98765432
R.J.Ludlum 3.50
Year
98765432
Trevayne
Robert Ludlum
4.00
Month
IDMax_length(author)
Min(price)
Concat(Month,Year)
![Page 30: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/30.jpg)
30
Uncertainty
Conflict between a non-null value and one or more null values that are all used to describe the same property of a real-world entity
Causes:Missing information, such as null values in a source or a completely missing attribute in a source
Contradiction
It is a conflict between two or more different non-null values that are all used to describe the same property of the same entity.
Causes:Contradiction is caused by different sources providing different values for the same attribute of a real-world entity.
TYPES OF DATA CONFLICTUncertainty
NULL value vs. non-NULL value“Easy” case
ContradictionNon-NULL value vs. (different) non-NULL value
Title Year Director
Source
Snatch
2000 Ritchie S1
Snatch
2000 null S2
Title Year Director
Source
Snatch
2000 Ritchie S1
Snatch
2000 Benaud S2
![Page 31: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/31.jpg)
31
unknown◦ There is a value, but I do not know it.◦ E.g.: Unknown date-of-birth
not applicable◦ There is no meaningful value.◦ E.g.: Spouse for singles
withheld◦ There is a value, but we are not authorized to see
it.◦ E.g.: Private phone line
NULL TYPES
![Page 32: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/32.jpg)
32
________ refers to “Conflict between a non-null value and one or more null values that are all used to describe the same property of a real-world entity”
A. Contradiction B. Uncertainty C. Resolution D. Ignorance
Question
![Page 33: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/33.jpg)
33
Classification of Functions
conflictignorance
conflictavoidance
conflictresolution
conflict resolutionstrategies
instancebased
instancebased
metadatabased
metadatabased
decidingmediating deciding mediatingCoalesce
ChooseDependingConcat
AVG, SUMMIN, MAXRandom
Vote
Choose
MostRecentMostAbstractMostSpecific
Escalate
CommonAncestor
![Page 34: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/34.jpg)
34
Function Description ExamplesMin, Max, Sum, Count, Avg
Standard aggregation NumChildren, Salary, Height
Random Random choice Shoe sizeLongest, Shortest Longest/shortest value First_nameChoose(source) Value from a particular source DoB (DMV), CEO (SEC)ChooseDepending(val, col)
Value depends on value chosen in other column
city & zip, e-mail & employer
Vote Majority decision RatingCoalesce First non-null value First_nameGroup, Concat Group or concatenate all values Book_reviewsMostRecent Most recent (up-to-date) value AddressMostAbstract, MostSpecific, CommonAncestor
Use a taxonomy / ontology Location
Escalate Export conflicting values gender
Conflict Resolution Functions
![Page 35: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/35.jpg)
35
Data Fusion Goals
a, b, c a, b, c, d
Assume 2 sources, Source 1(A,B,C) and Source 2(A,B,D)
a, b, da, b, c, -a, b, -, d
a, b, - a, b, -, -a, b, -
a, b, -, -a, b, -, -
a, b, ca, f(b,e), c, d
a, e, da, b, c, -a, e, -, d
a, b, c a, b, c, -a, b, -
a, b, c, -a, b, -, -
Identical tuples
Subsumed tuples
Conflicting tuples
Complementing tuples
![Page 36: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/36.jpg)
36
Identical tuples (duplicates)UNION, OUTER UNION
Subsumed tuples (uncertainty)MINIMUM UNION
Complementing tuples (uncertainty)COMPLEMENT UNION, MERGE
Conflicting tuples (contradiction)MATCH, GROUP, FUSE
Relational Operators – Overview
![Page 37: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/37.jpg)
37
UNION
Title Author ISBNA X 12345678
9B Y 21345678
9Name Author IDD P 31245678
9A X 12345678
9B Y 21345678
9
UNION
Name Author ISBNA X 12345678
9B Y 21345678
9D P 31245678
9
( SELECT Title AS Name,Author,ISBN FROM R)
UNION( SELECT Name,Author,ID AS ISBN FROM S)
![Page 38: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/38.jpg)
38
MINIMUM UNIONA B Ca b ce f gm n o
A B Da b
e f hm p
+ =
A tuple t1 subsumes a tuple t2, if it has same schema, has less NULL-values, and coincides in all non-NULL-values.
A B C Da b c
e f g
e f hm n o
m p
A B C Da b c
a b
e f g
e f hm n o
m p
Select A,B,C,D AS NULL FROM RUNION ALLSELECT A,B, C AS NULL,D FROM S
![Page 39: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/39.jpg)
39
FULL DISJUNCTIONA B Ca b ce f gk ok m
A B Da b
e f hm p
k q r
A B C Da b c
e f g hm p
k o
k m
k q r
|⋈| =
A B C Da b c
e f g hm p
k o
k m
k q r
SELECT * FROM R FULL OUTER JOIN S ON R.A = S.A AND R.B = S.B;
![Page 40: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/40.jpg)
40
A B Ca b ce f gm n om n
q r s
A B Da b
e f hm p
|⋈
A B Ca b ce f gm n om n
q r s
A B Da b
e f hm p
⋈|
A B C Da COAL(b,
b)c
e COAL(f,f)
g h
m COAL(n,p)
o
m COAL(n,p)
q r s
A B C Da COAL(b,
b)c
e COAL(f,f)
g h
m COAL(p,n)
o
m COAL(p,n)
A B C Da b c
e f g hm n o
m n
q r s
=
=
=
=
A B C Da b c
e f g hm p o
m p
MERGE AND PRIORITIZED MERGESELECT * FROM R FULL OUTER JOIN S ON R.A = S.A AND R.B = S.B;
![Page 41: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/41.jpg)
41
SELECT Name, RESOLVE(Age, max), RESOLVE(Address,choose(EE_Students))FUSE FROM EE_Students,CS_StudentsFUSE BY (Name)
FUSE BYName Age Addre
ssRam 20 ABCDRajesh 21 EFGHName Age Addre
ssRam 23 ABCDRajesh 20 PQRS
RESULT
Name Age Address
Ram 23 ABCDRajesh 21 PQRS
![Page 42: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/42.jpg)
42
SELECT ID,RESOLVE(Title,
Choose(IMDB)), RESOLVE(Year, Max),
RESOLVE(Director,Concat),RESOLVE(Rating),
FUSE FROM IMDB, FilmdienstFUSE BY (ID) ON ORDER Year DESC
ID Title Year Director
Rating
1101 A 1975 Michael
Null
1102 B 1987 John 51103 C 1999 Mark NullID Title Year Direct
orRating
1101 C 1976 King 41102 B 1983 Davis Null1103 D 1997 Anthon
y2
IMDB
FILMBUFF
ID Title Year Director
Rating
1103
C 1999
Mark Anthony
2
1102
B 1987
John Davis
5
1101
A 1976
Mark Anthony
4
RESULT
FUSE BY
![Page 43: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/43.jpg)
43
Question
a, b, c, -a, b, -, d
a, b, -, -a, b, -, -
a, b, c, -a, e, -, d
a, b, c, -a, b, -, -
Identical tuples
Subsumed tuples
Conflicting tuples
Complementing tuples
Match The Following
1 a
2
3
4
b
c
d
![Page 44: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/44.jpg)
44
Hummer Screenshot
![Page 45: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/45.jpg)
45
Hummer Screenshot
![Page 46: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/46.jpg)
46
Hummer Screenshot
![Page 47: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/47.jpg)
47
Hummer Screenshot
![Page 48: DATA FUSION](https://reader035.fdocuments.net/reader035/viewer/2022062310/568166a3550346895dda8fb3/html5/thumbnails/48.jpg)
48
http://coitweb.uncc.edu/~wwu18/itcs6010/presentation/fusion_vldb.pdf
http://vldb.idi.ntnu.no/program/slides/demo/s1251-bilke.pdf
http://coitweb.uncc.edu/~wwu18/itcs6010/presentation/fusion-3step.pdf
http://www.hpi.uni-potsdam.de/fileadmin/hpi/FG_Naumann/publications/Modena05.pdf
http://vldb2009.org/files/DataFusionFinal.pdf http://disi.unitn.it/~p2p/RelatedWork/
Matching/dublicatesICDE05.pdf
REFERENCES