Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

46
Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University Funded by NSF

description

Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views. Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University. Funded by NSF. Leverage this …. … to do this. Information Exchange. Source. Target. Information - PowerPoint PPT Presentation

Transcript of Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Page 1: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Extracting Information from Heterogeneous Information

Sources Using Ontologically Specified Target Views

Joachim Biskup

Universität Dortmund

and

David W. Embley

Brigham Young University

Funded by NSF

Page 2: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Information ExchangeSource Target

InformationExtraction

SchemaMatching

Leveragethis …

… to dothis

Page 3: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Presentation Outline

• Overview• Matching (Direct)• Matching (Derived)• Matching Algorithm• Summary

Page 4: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University
Page 5: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Requirements

1. f is an injective function.2. f maps obj. sets to obj. sets and rel. sets to rel. sets3. f respects rel-set arities.4. f respects referential integrity.5. f respects types.6. f respects real-world identity.7. f ’s coercions are G/S compatible.8. f respects subset constraints.9. f respects mutual-exclusion constraints.10. f respects union constraints

Page 6: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

User Interaction(IDS Statements)

• Issue– Explains the issue– Example: units, may need transformation

• Default– Explains the default option– Example: if no transformation, no conversion

• Suggestion– Gives a suggestion about how to resolve the issue– Example: if needed, specify the conversion

Page 7: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Theorem

Let f be the generated mapping from target t to source s,populated such that s has a valid interpretation. Let t’ bethe submodel of t populated from s by f. Then t’ has avalid interpretation.

Proof: the paper is the proof …

Page 8: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Target(Graphical View)

Page 9: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Target(Textual View)

Page 10: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Source Example(Assumed to be Populated)

Page 11: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Matching (Direct)

• Object Sets

• Relationship Sets

Page 12: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Object-Set Type Compatibility

<a, b>1. type(a) = type(b)2. type(a) type(b)3. type(a) type(b)4. type(a) type(b)

Page 13: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

type(a) = type(b)• Same type

– string = string, but Airport Head Of State– Need better matching techniques

• Same type, different units– Size Nr Sq Km– Need unit conversion

• Same type, different format– Date Date, but 01/02/2002 Jan 2, 2002– Need format conversion

• Same type, same units and format, different assumptions– Altitude Altitude, but altitude of aircraft and spacecraft differ– Need same assumptions

• Same type, same units and format, same assumption, OIDs

Page 14: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

type(a) type(b)and type(a) type(b)

• Real Integer or Video Image– Target has greater discriminating power– Can add .0 or make a video of a single image (?)

• Integer Real or Image Video– Source has greater discriminating power– Can round off or select one of the frames (?)

Page 15: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

type(a) type(b)

• Image String– Mismatch, even if same attribute (e.g. both City)– Types can help discard potential matches

• String(5) Integer– But suppose the integer is 2– Might work, but is “2.000” ok?

Page 16: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Relationship Match Requirements

• Referential integrity

• Constraints– Cardinality– Mandatory/Optional

Page 17: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Referential Integrity

a

b

a’

b’

Target Source

. . . . . . a’’

The types of a, a’, and a’’ canall be different, but not arbitrary.Example: a (String), a’ (Integer),a’’ (Real).

Page 18: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Relationship-Set Constraint Compatibility

<a, b>1. constr(a) <=> constr(b)2. (constr(a) <= constr(b)) (constr(a) => constr(b))3. (constr(a) <= constr(b)) (constr(a) => constr(b))4. (constr(a) <= constr(b)) (constr(a) => constr(b))

Page 19: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

constr(a) <=> constr(b)

Person Car

owns

drives

o

o

o

o

Person Car?

o o

Need more information to resolve: Perhaps “?” is “purchased.”

Page 20: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

(constr(a) <= constr(b)) (constr(a) => constr(b))

City

City Map

City

City Map

a b

The target (a) expects many maps, but the source can’t supply them.

Page 21: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

(constr(a) <= constr(b)) (constr(a) => constr(b))

City

City Map

City

City Map

a b

The target (a) expects one map, but the source can supply many.

Page 22: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

(constr(a) <= constr(b)) (constr(a) => constr(b))

City

City Map

City

City Map

a b

The target (a) expects at least one and potentially many maps,but the source may have none or at most one.

o

Page 23: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Matching (Derived)

• Generalization/Specialization• Composite Values• Derived Relationship Sets• Displayable/Nondisplayable Object Sets

Page 24: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Generalization/Specialization

• For a target object set, a source object set may:– have no overlap (just ignore)– have a proper subset (accept or find missing

generalization)– have the same values (direct match)– have a proper superset (hard, except for roles)– overlap (like proper subset and proper superset)

• Consider roles and missing generalizations

Page 25: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Roles

target:

source:

City Travel Video

City Clip: Video

o o

o o

Video WithCity Scene

Video WithCity Scene

Page 26: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Missing Generalization

target source

City Map Country Map City Map: Image Country Map: Image

Map: Image

Map: Image

Page 27: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Composite Values

• Composite in Source (split)• Composite in Target (merge)• Examples of Derived Relationships

Page 28: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Composite in Source

Video

Nr Hours Nr Minutes

Video

Time

Nr Hours Nr Minutes

target source

Note also that we generated a source path.

Page 29: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Composite in Source

Video

Nr Hours Nr Minutes

Video

Nr Hours Nr Minutes

target source

Page 30: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Composite in Target

Video

Nr Hours Nr Minutes

target

Video

Time

source

Time

Page 31: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Composite in Target

Video

target

Video

Time

source

Time

Page 32: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Displayable/NondisplayableObject-Set Matches

• Nondisplayable in Source: find a key

• Nondisplayable in Target: create a key

Page 33: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Nondisplayable in Source

target source

Airport Airport

No Key: Discard Match

City

Airline

flys to

serves

Page 34: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Nondisplayable in Source

target source

Airport Airport

No Key: Discard Match

City

Airline

flys to

serves

Page 35: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Nondisplayable in Source

target source

Airport Airport

One Key: Choose it

City

Airline

flys to

serves

Airport Name

Page 36: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Nondisplayable in Source

target source

Airport Airport

One Key: Choose it

City

Airline

flys to

serves

Airport Name

Page 37: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Nondisplayable in Source

target source

Airport Airport

Two or more Keys: Choose One

City

Airline

flys to

serves

Airport Name

Airport Code

Page 38: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Nondisplayable in Source

target source

Airport Airport

Two or more Keys: Choose One

City

Airline

flys to

serves

Airport Name

Airport Code

Page 39: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Matching Algorithm

Page 40: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University
Page 41: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Sample Match Table

Page 42: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Pictorial View of Match Table

target

source

Page 43: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Summary

Page 44: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Concluding Remarks

• QED (the theorem holds)

Let f be the generated mapping from target t to source s,populated such that s has a valid interpretation. Let t’ bethe submodel of t populated from s by f. Then t’ has avalid interpretation.

Proof: the paper is the proof …

Page 45: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Pictorial View of Match Table

t = target

s = source

f = the mapping

t’ has a validinterpretation

t’ = submodel

Page 46: Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Concluding Remarks

• QED (the theorem holds)• Merge (several sources)

– All sources extracted to same view– Union merge

• Object identity problems• Constraint problems

• Source Modeling (convert to OSM)• Framework defined, but not implemented