Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Post on 19-Jan-2016

33 views 2 download

description

Extracting Information from Heterogeneous Information Sources Using Ontologically Specified Target Views. Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University. Funded by NSF. Leverage this …. … to do this. Information Exchange. Source. Target. Information - PowerPoint PPT Presentation

Transcript of Joachim Biskup Universität Dortmund and David W. Embley Brigham Young University

Extracting Information from Heterogeneous Information

Sources Using Ontologically Specified Target Views

Joachim Biskup

Universität Dortmund

and

David W. Embley

Brigham Young University

Funded by NSF

Information ExchangeSource Target

InformationExtraction

SchemaMatching

Leveragethis …

… to dothis

Presentation Outline

• Overview• Matching (Direct)• Matching (Derived)• Matching Algorithm• Summary

Requirements

1. f is an injective function.2. f maps obj. sets to obj. sets and rel. sets to rel. sets3. f respects rel-set arities.4. f respects referential integrity.5. f respects types.6. f respects real-world identity.7. f ’s coercions are G/S compatible.8. f respects subset constraints.9. f respects mutual-exclusion constraints.10. f respects union constraints

User Interaction(IDS Statements)

• Issue– Explains the issue– Example: units, may need transformation

• Default– Explains the default option– Example: if no transformation, no conversion

• Suggestion– Gives a suggestion about how to resolve the issue– Example: if needed, specify the conversion

Theorem

Let f be the generated mapping from target t to source s,populated such that s has a valid interpretation. Let t’ bethe submodel of t populated from s by f. Then t’ has avalid interpretation.

Proof: the paper is the proof …

Target(Graphical View)

Target(Textual View)

Source Example(Assumed to be Populated)

Matching (Direct)

• Object Sets

• Relationship Sets

Object-Set Type Compatibility

<a, b>1. type(a) = type(b)2. type(a) type(b)3. type(a) type(b)4. type(a) type(b)

type(a) = type(b)• Same type

– string = string, but Airport Head Of State– Need better matching techniques

• Same type, different units– Size Nr Sq Km– Need unit conversion

• Same type, different format– Date Date, but 01/02/2002 Jan 2, 2002– Need format conversion

• Same type, same units and format, different assumptions– Altitude Altitude, but altitude of aircraft and spacecraft differ– Need same assumptions

• Same type, same units and format, same assumption, OIDs

type(a) type(b)and type(a) type(b)

• Real Integer or Video Image– Target has greater discriminating power– Can add .0 or make a video of a single image (?)

• Integer Real or Image Video– Source has greater discriminating power– Can round off or select one of the frames (?)

type(a) type(b)

• Image String– Mismatch, even if same attribute (e.g. both City)– Types can help discard potential matches

• String(5) Integer– But suppose the integer is 2– Might work, but is “2.000” ok?

Relationship Match Requirements

• Referential integrity

• Constraints– Cardinality– Mandatory/Optional

Referential Integrity

a

b

a’

b’

Target Source

. . . . . . a’’

The types of a, a’, and a’’ canall be different, but not arbitrary.Example: a (String), a’ (Integer),a’’ (Real).

Relationship-Set Constraint Compatibility

<a, b>1. constr(a) <=> constr(b)2. (constr(a) <= constr(b)) (constr(a) => constr(b))3. (constr(a) <= constr(b)) (constr(a) => constr(b))4. (constr(a) <= constr(b)) (constr(a) => constr(b))

constr(a) <=> constr(b)

Person Car

owns

drives

o

o

o

o

Person Car?

o o

Need more information to resolve: Perhaps “?” is “purchased.”

(constr(a) <= constr(b)) (constr(a) => constr(b))

City

City Map

City

City Map

a b

The target (a) expects many maps, but the source can’t supply them.

(constr(a) <= constr(b)) (constr(a) => constr(b))

City

City Map

City

City Map

a b

The target (a) expects one map, but the source can supply many.

(constr(a) <= constr(b)) (constr(a) => constr(b))

City

City Map

City

City Map

a b

The target (a) expects at least one and potentially many maps,but the source may have none or at most one.

o

Matching (Derived)

• Generalization/Specialization• Composite Values• Derived Relationship Sets• Displayable/Nondisplayable Object Sets

Generalization/Specialization

• For a target object set, a source object set may:– have no overlap (just ignore)– have a proper subset (accept or find missing

generalization)– have the same values (direct match)– have a proper superset (hard, except for roles)– overlap (like proper subset and proper superset)

• Consider roles and missing generalizations

Roles

target:

source:

City Travel Video

City Clip: Video

o o

o o

Video WithCity Scene

Video WithCity Scene

Missing Generalization

target source

City Map Country Map City Map: Image Country Map: Image

Map: Image

Map: Image

Composite Values

• Composite in Source (split)• Composite in Target (merge)• Examples of Derived Relationships

Composite in Source

Video

Nr Hours Nr Minutes

Video

Time

Nr Hours Nr Minutes

target source

Note also that we generated a source path.

Composite in Source

Video

Nr Hours Nr Minutes

Video

Nr Hours Nr Minutes

target source

Composite in Target

Video

Nr Hours Nr Minutes

target

Video

Time

source

Time

Composite in Target

Video

target

Video

Time

source

Time

Displayable/NondisplayableObject-Set Matches

• Nondisplayable in Source: find a key

• Nondisplayable in Target: create a key

Nondisplayable in Source

target source

Airport Airport

No Key: Discard Match

City

Airline

flys to

serves

Nondisplayable in Source

target source

Airport Airport

No Key: Discard Match

City

Airline

flys to

serves

Nondisplayable in Source

target source

Airport Airport

One Key: Choose it

City

Airline

flys to

serves

Airport Name

Nondisplayable in Source

target source

Airport Airport

One Key: Choose it

City

Airline

flys to

serves

Airport Name

Nondisplayable in Source

target source

Airport Airport

Two or more Keys: Choose One

City

Airline

flys to

serves

Airport Name

Airport Code

Nondisplayable in Source

target source

Airport Airport

Two or more Keys: Choose One

City

Airline

flys to

serves

Airport Name

Airport Code

Matching Algorithm

Sample Match Table

Pictorial View of Match Table

target

source

Summary

Concluding Remarks

• QED (the theorem holds)

Let f be the generated mapping from target t to source s,populated such that s has a valid interpretation. Let t’ bethe submodel of t populated from s by f. Then t’ has avalid interpretation.

Proof: the paper is the proof …

Pictorial View of Match Table

t = target

s = source

f = the mapping

t’ has a validinterpretation

t’ = submodel

Concluding Remarks

• QED (the theorem holds)• Merge (several sources)

– All sources extracted to same view– Union merge

• Object identity problems• Constraint problems

• Source Modeling (convert to OSM)• Framework defined, but not implemented