A Semantic Approach to Discovering Schema Mapping

Post on 25-Feb-2016

44 views 0 download

description

A Semantic Approach to Discovering Schema Mapping. Yuan An, Alex Borgida , Renee J. Miller, and John Mylopoulos Presented by: Kristine Monteith . Overview. Goal of the paper: Matching schemas with more than just simple element correspondence (e.g. Can we improve on a naïve mapping?). - PowerPoint PPT Presentation

Transcript of A Semantic Approach to Discovering Schema Mapping

A SEMANTIC APPROACH TO DISCOVERING SCHEMA MAPPINGYuan An, Alex Borgida, Renee J. Miller, and John MylopoulosPresented by: Kristine Monteith

OVERVIEWGoal of the paper: Matching schemas with

more than just simple element correspondence

(e.g. Can we improve on a naïve mapping?)

OVERVIEWApproach: Derive a conceptual model for the

semantics in a table and match the conceptual model in the source schema to the conceptual model in the target schema

e.g. Can we figure out that a source schema like this:

can match a target schema like this: hasBookSoldAt(aname,sid)

EXAMPLE 1

BASELINE SOLUTION: REFERENTIAL INTEGRITY CONSTRAINTS Find correspondences

v1: connect person.pname to hasBookAt.aname v2: connect bookstore.sid and hasBookSoldAt.sid

Create logical relations using referential constraints S1: person(pname) |X| writes(pname, bid) |X| book(bid) S2: book(bid) |X| soldAt(bid,sid) |X| bookstore(sid) S3: person(name) S4: bookstore(sid)

Look at target T1: hasBookSoldAt(aname,sid)

Look at each pair of source and target relations and check to see which are “covered” <S1,T1,v1> <S2,T1,v2> <S3,T1,v1> <S4,T1,v2>

ASK THE USER ABOUT THE FOLLOWING:

Doesn’t present an entire tuple to match the target query: hasBookSoldAt(aname,sid)

WHAT THIS PAPER SEEKS TO ACCOMPLISH: Generate the following:

compose “writes” and “soldAt” to produce a new semantic connection between “person” and “bookstore”

APPROACH:REPRESENTING SEMANTICS OF SCHEMAS Create a Conceptual Model (CM) graph

Create nodes for classes and attributes Create directed edges for relationships and

inverses

C1 ---ISA--- C2 subclassesC ---p--- D relationshipsC ---p->-- D functional relationships

o Duplicate concept nodes to represent recursive relationships

GENERATING MAPPING CANDIDATES Problem description

Inputs: A source relational schema S and a target

relational schema T A concept model (GS and GT respectively)

associated with each relational schema via table semantic mappings

A set of correspondences L linking a set L(S) of columns in S to a set L(T) of columns in T

Goal: A pair of expressions <E1,E2> which are

“semantically similar” in terms of modeling the subject matter

MARKED NODES The set L(S) of columns gives rise to a set CS

of marked class nodes in the graph GS Likewise, the set L(T) gives rise to a set CT of

marked class nodes in the graph GT

BASIC ALGORITHM Create conceptual subgraphs

find a subgraph D1 connecting concept nodes in CS, and a subgraph D2 connecting concept nodes in CT such that D1 and D2 are “semantically similar

Suggest possible mapping candidates translate D1 and D2 into algebraic expressions E1

and E2 and return the triple < E1,E2,LM> as a mapping candidate

CREATING CONCEPTUAL SUBGRAPHS Notice simple matches

a node v in CS corresponds to a node u in CT when v and u have attributes that are associated with corresponding columns via the table semantics

More complicated rules The connections (v1,v2) and (u1,u2) should be

“semantically similar” or at least “compatible” (cardinality constraints, relationships like “is-a” or “part of”)

Use edges from pre-selected trees Represent “intuitively meaningful” concepts Favor smaller trees (Occam’s razor)

Other considerations Favor lossless joins Reject contradictions

EXAMPLE Looking for a functional tree with a root

corresponding to the anchor Proj

EXAMPLE Notice simple matches Find a tree with minimal cost (edges in pre-selected

trees don’t contribute to cost) Find a tree containing the most number of edges in the

pre-selected trees

Project ---controlledBy->-- Department --hasManager->-- Employee

MORE COMPLICATED EXAMPLE

Same Answer:Project ---controlledBy->-- Department --hasManager->-- Employee

Still looking for low-cost, minimal trees to connect Employee to Project

DEALING WITH N-ARY RELATIONS StoreSells(Person, Product)

CONSIDERATIONS FOR REIFIED RELATIONSHIPS A path of length 2 passing through a reified

relationship node should be considered to be length 1

The semantic category of a target tree rooted at a reified relationship induces preferences for similarly rooted (minimal) functional trees in the source (cardinality restrictions, number of roles, subclass relationship to top level ontology concept)

OBTAINING RELATIONAL EXPRESSIONS

EXPERIMENTAL RESULTS

AVERAGE PRECISION

AVERAGE RECALL

CONCLUSIONS Semantic approach performs at least as well

as the RIC-based approach on datasets studied

These approaches made significant improvements in some cases

Many of the datasets did not have complicated schema; a semantic approach didn’t provide as much benefit in those cases

STRENGTHS/WEAKNESSES Strengths

Lots of examples Provides a useful solution to a common problem

Weaknesses Formalism sometimes made things more

complicated rather than more clear Assumes a lot of background knowledge

FUTURE WORK Embed this functionality into pre-existing

mapping tools (they suggest Clio since a lot of their work is based off of this)

Add negation to semantic representation Investigate more complex semantic

mappings

QUESTIONS???