1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang...

46
1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore

Transcript of 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang...

Page 1: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

1

Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach

Xia Yang Mong Li Lee Tok Wang Ling

National University of Singapore

Page 2: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

2

Outline

Introduction Background Preliminaries Motivating Example Integration Algorithm Related work Conclusion

Page 3: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

3

Introduction

Recent research in integrating XML data sources has mainly concentrated on schema matching.

The XML Schema or DTD is lacking in semantics.

The source schemas are heterogeneous, containing various conflicts involving naming conflict, cardinality conflict, structural conflict.

Page 4: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

4

Background

Most of the work of integration of XML has focused on the matching problem to find equivalent elements among the different sources.

LSD [4] employs instance information and machine learning techniques base on the instance information in their integration work.

E. Jeong and C.-N. Hsu [7] use schema learning to generate a set of tree grammar rules from the DTDs in a class and optimizes the rules to transforms them into an integrated view.

[4].A. Doan, P. Domingos, A. Levy. Learning Source Descriptions for Data Integration. WebDB, 2000.[7].E. Jeong, C.-N. Hsu. Induction of Integrated View for XML Data with Heterogeneous DTDs. ACM CIKM, 2001.

Page 5: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

5

Background (cont.)

E. Jeong and C.-N. Hsu [7] DTD clustering clusters DTDs in similar domains into

classes. Schema learning applies a tree grammar inference

technique to generate a set of tree grammar rules from the DTD in a class from the previous step.

Minimization optimizes the rules generated in the previous step and transforms them into an integrated view.

Page 6: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

6

Background (cont.)

These work lack of semantic meaning, which may lead to the wrong integrated schema.

All these work do not take into consideration the importance of the individual data sources, and how the majority of the local schemas model their data.

They are binary strategies, cannot take the importance of sources into consideration.

Page 7: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

7

Preliminaries

ORA-SS Model The ORA-SS model (Object-Relationship-

Attribute model for Semi-Structured data) is a semantically rich data model that has been designed for semi-structured data [5].

The ORA-SS model distinguishes between objects, relationships and attributes.

[5]. G. Dobbie, X. Wu, T.W. Ling, M.L. Lee. ORA-SS: An Object-Relationship-Attribute Model for Semi-structured Data. Technical Report TR21/00, National University of Singapore, 2000.

Page 8: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

8

Preliminaries (cont.)

ORA-SS Schema Diagram example.

p ar t

s u p p lie rs n o

p n o q u an t it y

jp s ,3 ,1 :n ,1 :n

jp s

p r o jec t

jn o

jp ,2 ,1 :n ,1 :n

f u n d s

u n o

p r o jec t m an ag er

n am em n o

Page 9: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

9

Preliminaries (cont.) Assumptions for the algorithm

The input to the proposed integration algorithm is a set of ORA-SS schemas with source weight.

The output of the algorithm is an integrated schema, also modeled in ORA-SS.

The integrated schema should contain all the information modeled in the original schemas. Further, the integrated schema should be as simple and concise as possible to facilitate users’ understanding.

For meaningful integration to occur, we assume that the various sources model similar domains.

Object classes and relationship sets with the same label name are considered to be semantically equivalent. Attributes of the same object class (or relationship set) with the same label name are also semantically equivalent.

Page 10: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

10

Motivating Examplep r o jec t

p ar tjn o

p n o

lo c a l f u n d s

ln o

f o r e ig n f u n d s

fn o

p ro jectm an ager

p r o jec t m an ag er

m n o n am e em ail

o r g an iza tio n

o r g n am e

ab b rev iat io n fu ll n am e

p ar t

s u p p lie rs n o

p n o q u an t it y

jp s ,3 ,1 :n ,1 :n

jp s

p r o jec t

jn o

jp ,2 ,1 :n ,1 :n

f u n d s

u n o

p r o jec t m an ag er

n am em n o

(a) Schema S1, sw1=1 (b) Schema S2, sw2=1

(c) Schema S3, sw3=7 (d) Schema S4, sw4=1 The swi under each schema indicates the source weight, i.e., the importance of a source. This is determined by users or computed based on some statistic information.

p r o jec t

s u p p lie rjn o

s n on am e p r o jec t m an ag er

m n o

s ta f f

o r d in ar y s ta f f

en o

p ar t

p n o

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

n am e

o r g n am e

ab b rev iat io n fu ll n am e

ad d res s

Page 11: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

11

Motivating Example

A. Resolve attribute-object class conflict. B. Resolve generalizations and

specializations. C. Merge the schemas to obtain an

integrated graph. D. Transform integrated graph to resolve

structural conflicts and remove redundancy. E. Augment Graph with Attributes.

Page 12: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

12

A. Resolve attribute-object class conflict.

This occurs when a concept has been modeled as an attribute in one schema, and as an object class in another schema.

This conflict can be resolved by

transforming the attribute to an object class.

Page 13: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

13

A. Resolve attribute-object class conflict. (cont.) (example)

p r o jec t

p ar tjn o

p n o

lo c a l f u n d s

ln o

f o r e ig n f u n d s

fn o

p ro jectm an ager

p r o jec t m an ag er

m n o n am e em ail

o r g an iza tio n

o r g n am e

ab b rev iat io n fu ll n am e

p r o jec t

p ar tjn o

p n o

lo c a l f u n d s

ln o

f o r e ig n f u n d s

fn o

p r o jec t m an ag er

m n o

(a) Schema S1, sw1=1 (b) Schema S2, sw2=1

(c) Schema S1’: Attribute “project manager” in schema S1 hasbeen transformed into an object class “project manager” in S1’.

Page 14: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

14

B. Resolve generalizations and specializations.

A generalization exists when an object class in one schema is the union of several object classes in another schema.

The integrated schema will include the generalization isa hierarchy.

Page 15: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

15

B. Resolve generalizations and specializations. (cont.) (example)

p r o jec t

p ar tjn o

p n o

lo c a l f u n d s

ln o

f o r e ig n f u n d s

fn o

p ro jectm an ager

p ar t

s u p p lie rs n o

p n o q u an t it y

jp s ,3 ,1 :n ,1 :n

jp s

p r o jec t

jn o

jp ,2 ,1 :n ,1 :n

f u n d s

u n o

p r o jec t m an ag er

n am em n o

lo c a l f u n d s

ln o

f o r e ig n f u n d s

fn o

f u n d s

Schema S1, sw1=1 Schema S4, sw4=1

Build a generalization hierarchy from part of S1, which is used for next step to generate an integrated graph

Page 16: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

16

C. Merge the schemas to obtain an integrated graph.

Each node in the graph denotes an object class, and edges represent the relationship sets among the object classes.

To facilitate processing, attributes are first omitted from the integrated graph. The attributes will be incorporated into the final integrated schema.

Compute the edge weight.

Page 17: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

17

C. Merge the schemas to obtain an integrated graph. (cont.)

Compute the edge weight. For each original source, the edge weight is the

source weight multiplied by the number of relationship sets involved in this edge.

The edge weight in the integrated graph is the sum of all the edge weights of this edge from the original sources.

Page 18: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

18

C. Merge the schemas to obtain an integrated graph. (cont.) (example)

p r o jec t

p ar tjn o

p n o

lo c a l f u n d s

ln o

f o r e ig n f u n d s

fn o

p ro jectm an ager

p r o jec t m an ag er

m n o n am e em ail

o r g an iza tio n

o r g n am e

ab b rev iat io n fu ll n am e

p ar t

s u p p lie rs n o

p n o q u an t it y

jp s ,3 ,1 :n ,1 :n

jp s

p r o jec t

jn o

jp ,2 ,1 :n ,1 :n

f u n d s

u n o

p r o jec t m an ag er

n am em n o

(a) Schema S1, sw1=1 (b) Schema S2, sw2=1

(c) Schema S3, sw3=7 (d) Schema S4, sw4=1

p r o jec t

s u p p lie rjn o

s n on am e p r o jec t m an ag er

m n o

s ta f f

o r d in ar y s ta f f

en o

p ar t

p n o

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

n am e

o r g n am e

ab b rev iat io n fu ll n am e

ad d res s

Page 19: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

19

C. Merge the schemas to obtain an integrated graph. (cont.)

Example of edge weight Since we have “project” as the parent of “project

manager” in schemas S1 and S4, the weight of the edge from “project” to “project manager” is given by the sum of the weights of these schemas, that is, 1+1=2.

Since “project” is the parent of “staff” in schema S3 only, the weight of this edge is 7. Since the edge from “project” to “supplier” in S3 is actually involved in two relationship sets js and jsp, its edge weight would be given by 7*2=14.

Page 20: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

20

C. Merge the schemas to obtain an integrated

graph. (cont.) (example)

p r o jec t

s u p p lie r

p r o jec t m an ag er

s ta f f

o r d in ar y s ta f f

p ar t

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

o r g an iza tio n

o r g n am e

f u n d s

f o r e ig n f u n d slo c a l f u n d s

37

71

1 4 2

7

2

7

1

1

7

1 1

Integrated graph obtained from the schemas in page 10.

Page 21: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

21

D. Transform integrated graph to resolve structural conflicts and remove redundancy.

D-1. Differentiate semantically different relationship sets among equivalent object classes.

D-2. Remove relationship sets that are projections of higher degree relationship sets.

D-3. Resolve ancestor-descendant conflicts.

D-4. Remove transitive relationship sets.

D-5. Remove other type of multiple parent nodes.

Page 22: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

22

multiple parent nodes

If a node has more than one incoming edges in an integrated graph, it is called a multiple parent node.

Page 23: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

23

D-1. Differentiate semantically different relationship sets among equivalent object classes.

If the relationship sets among the same object classes are semantically different.

Then duplicate the nodes and make foreign key-key references in the integrated graph. Move the object classes involved in n-nary (n>2) relationship set.

Page 24: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

24

D-1. Differentiate semantically different relationship sets among equivalent object classes. (cont.) (example)

p er s o n

h o u s en am e

ad d res s

p h 1 ,2 ,1 :n ,1 :n

t y p e

p er s o n

h o u s en am e

ad d res s

p h 2 ,2 ,1 :n ,1 :n

t y p ec o n tr ac t

cid t im e

p h c,3 ,1 :1 ,1 :n

p er s o n

h o u s e

p h 2 ,2 ,1 :n ,1 :n

c o n tr ac t

p h 1 ,2 ,1 :n ,1 :n

p h c,3 ,1 :1 ,1 :n

Integrated graph G56 Transformed graph G56’ because ph1 and ph2 are semantically

different.

Schema S5 Schema S6

p er s o n

h o u s e 1

ad d res s

p h 1 ,2 ,1 :n ,1 :n

h o u s e 2

ad d res s

p h 2 ,2 ,1 :n ,1 :n

h o u s e

ad d res s

c o n tr ac t

p h c,3 ,1 :1 ,1 :n

t y p e

back

Ph1 and ph2 are semantically different.

Page 25: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

25

D-2. Remove relationship sets that are projections of higher degree relationship sets.

A schema may model a relationship set that is a projection of another relationship set in another schema.

Keep the complete relationship set in the integrated schema.

Page 26: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

26

D-2. Remove relationship sets that are projections of higher degree relationship sets. (cont.) (example)

p r o jec t

p ar tjn o

p n o

lo c a l f u n d s

ln o

f o r e ig n f u n d s

fn o

p ro jectm an ager

jp

p r o jec t

s u p p lie r

p ar t

p r o jec t

s u p p lie r

p ar t

Schema S1 Schema S3

Part of Integrated graph G13 Part of Transformed graph G13’

If jp is a projection of jsp. p r o jec t

s u p p lie rjn o

s n on am e p r o jec t m an ag er

m n o

s ta f f

o r d in ar y s ta f f

en o

p ar t

p n o

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

n am e

o r g n am e

ab b rev iat io n fu ll n am e

ad d res s

Page 27: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

27

D-3. Resolve ancestor-descendant conflicts. An ancestor-descendant conflict arises when a schema

models an object class A as an ancestor of object class B, and another schema models B as the ancestor of A. Such conflicts appear as cycles in the integrated graph.

The simplest form of this conflict is the parent-child conflict.

In the ancestor-descendant conflict cycle, remove the edge with the smallest edge weight, if this relationship set can be derived from other relationship sets in the cycle. If there are two edges with the same smallest edge weight, remove either one.

Page 28: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

28

D-3. Resolve ancestor-descendant conflicts. (cont.) (example)

Parent-child conflict

p r o jec t

s u p p lie r

p r o jec t m an ag er

s ta f f

o r d in ar y s ta f f

p ar t

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

o r g an iza tio n

o r g n am e

f u n d s

f o r e ig n f u n d slo c a l f u n d s

37

71

1 4 2

7

2

7

1

1

7

1 1

Page 29: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

29

D-3. Resolve ancestor-descendant conflicts. (cont.) (example)

Parent-child conflict: the edge from part to supplier is removed.

p r o jec t

s u p p lie r

p r o jec t m an ag er

s ta f f

o r d in ar y s ta f f

p ar t

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

o r g an iza tio n

o r g n am e

f u n d s

f o r e ig n f u n d slo c a l f u n d s

37

7

1 4 2

7

2

7

1

1

7

1 1

Page 30: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

30

D-3. Resolve ancestor-descendant conflicts. (cont.) (example)

d ep ar tm en t

h eadd n am e

h id h ead s ec r e ta r y

s id

d h

h s

h ead s ec r e ta r y

d ep ar tm en ts id

d n am e

s d

d ep ar tm en t

h ead

h ead s ec r e ta r y

d h

h s

s d

h ead s ec r e ta r y

d ep ar tm en t

h ead

s d

d h

h ead

d ep ar tm en t

h s

s d

h ead s ecret ary

Schema S7 Schema S8 Integrated graph G78

Case 1: Case 2(a): Case 2(b): if sd can be derived by dh and hs if hs derived by sd and dh if dh derived by hs and sd) and sw7=2 and sw8=1) and sw7=1 and sw8=2 and sw7=1 and sw8=2

Transformed graph G78’ Transformed graph G78’’(a) Transformed graph G78’’(b)

d ep ar tm en t

h ead

h ead s ec r e ta r y

d h

h s

Page 31: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

31

D-4. Remove transitive relationship sets and redundant object classes. If one relationship set from object class A to object class

B can be derived from relationship sets which is from A to other object class sets and back to B, it is called transitive relationship set.

Transitive relationship sets are also redundant, and can be removed so that the resulting integrated graph will be concise, if the intermediate node has attribute or other sub-object classes.

If the intermediate node has no attribute and no other sub-object classes, it will be considered as redundant object classes.

Page 32: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

32

D-4. Remove transitive relationship sets and redundant object classes. (cont.) (example)

p r o jec t

s u p p lie r

p r o jec t m an ag er

s ta f f

o r d in ar y s ta f f

p ar t

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

o r g an iza tio n

o r g n am e

f u n d s

f o r e ig n f u n d slo c a l f u n d s

37

7

1 4 2

7

2

7

1

1

7

1 1

Page 33: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

33

D-4. Remove transitive relationship sets and redundant object classes. (cont.) (example)

p r o jec t

s u p p lie r

p r o jec t m an ag er

s ta f f

o r d in ar y s ta f f

p ar t

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

o r g n am e

f u n d s

f o r e ig n f u n d slo c a l f u n d s

7

7

1 4 2

7 7

7

1 1

Page 34: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

34

D-5. Remove other type of multiple parent nodes. If a node has more than one incoming edges in

an integrated graph, it is called a multiple parent node. Case 1: D-1 Different relationship sets among the

same object classes. Case 2: D-2 Relationship sets that are projections of

the higher degree relationship sets. Case 3: D-3 Ancestor-descendant conflicts. Case 4: D-4 Transitive relationship sets. Case 5: D-5 Others. As examples in the following

page.

Page 35: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

35

D-5. Remove multiple parent nodes. (cont.) (example)

s c h o o l

s tu d en ts cn am e

s n u em ail

p r o jec t

s td u en tjn o

s n u ad d res s

jd

jd

m ark

s c h o o l

s td u en t

p r o jec t

s c h o o l

s tu d en t1

s n u

s tu d en t2

s n u

s tu d en t

s n u

p r o jec t

m ark

jd

jd

em ail ad d res s

Schema S9 Schema S10 Integrated graph G9-10

Transformed Graph G9-10’ back

Case 5:

Page 36: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

36

Transformed graph (summary)

Original integrated graph

p r o jec t

s u p p lie r

p r o jec t m an ag er

s ta f f

o r d in ar y s ta f f

p ar t

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

o r g an iza tio n

o r g n am e

f u n d s

f o r e ig n f u n d slo c a l f u n d s

37

71

1 4 2

7

2

7

1

1

7

1 1

Page 37: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

37

Transformed graph (cont.)

Transformed graph

p r o jec t

s u p p lie r

p r o jec t m an ag er

s ta f f

o r d in ar y s ta f f

p ar t

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n ,1 :n

o r g n am e

f u n d s

f o r e ig n f u n d slo c a l f u n d s

7

7

1 4 2

7 7

7

1 1

Page 38: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

38

E. Augment Graph with Attributes

Augment the graph with the attributes of object classes in the integrated schema.

Augment the graph with the attributes of relationship sets in the integrated schema.

For the attributes of duplicated object classes in case D-1 and D-5, the attributes will become the attributes of the original ones, not the duplicated object classes.D-1 D-5

For attributes of relationship sets which have been removed, the attributes will be the attributes of relationship sets which could derive this relationship set.

Page 39: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

39

Final integrated schema (example)

p r o jec t

s u p p lie rjn o

s n on am e p r o jec t m an ag er

m n o

s ta f f

o r d in ar y s ta f f

en o

p ar t

p n o

js ,2 ,1 :n , 1 :n

js p ,3 ,1 :n ,1 :n

n am e

o r g n am e

ab b rev iat io n fu ll n am e

lo c a l f u n d s

f u n d s

f o r e ig n f u n d s

ln o fn oem ailq u an t it y ad d res s

js p

back

Page 40: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

40

Integration Algorithm

1. Preprocessing.a. Resolve attribute-object class conflict.

b. Resolve generalizations and specializations.

2. Construct integrated graph.

3. Transform graph.

4. Augment graph with attributes.

Page 41: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

41

Step 3 Transform Graph

3.1 Differentiate semantically different relationship sets among equivalent object classes.

3.2 Remove relationship sets that are projections of higher degree relationship sets.

3. 3 Resolve any ancestor-descendant conflicts which create cycles in G.

3.4 Remove transitive relationship sets and redundant object classes.

3.5 Remove other multiple parent nodes.

Page 42: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

42

Related work (compare with [7])

p r o jec t

s u p p lie rjn o

s n o

n am e p r o jec t m an ag er

m n o

s ta f f

o r d in ar y s ta f f

en o

p ar t

p n o

js ,2 ,1 :n ,1 :n

js p ,3 ,1 :n , 1 :n

n am e

q u an t it y

lo c a l f u n d s

ln o

f o r e ig n f u n d s

fn o

f u n d s

u n o

em ail ad d res s

o r g n am e

ab b rev iat io n fu ll n am e

problems by Jeong and C.-N. Hsu [7].

Page 43: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

43

Related work (compare with [7]) (cont.)

p r o jec t

s u p p lie rjn o

s n on am e p r o jec t m an ag er

m n o

s ta f f

o r d in ar y s ta f f

en o

p ar t

p n o

js ,2 ,1 :n , 1 :n

js p ,3 ,1 :n ,1 :n

n am e

o r g n am e

ab b rev iat io n fu ll n am e

lo c a l f u n d s

f u n d s

f o r e ig n f u n d s

ln o fn oem ailq u an t it y ad d res s

js p

Integrated schema obtained by our approach

Page 44: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

44

Related work (compare with [7]) (cont.)

Our proposed method employs the ORA-SS conceptual model which is able to capture the semantics necessary for the resolution of structural conflict during integration.

We could take into consideration the importance of the individual data sources, and how the majority of the local schemas model their data.

We employ n-nary strategy. While The binary strategy will not be able to utilize the source importance and how the majority of the sources model the data. N-nary strategy is also faster compare to the binary strategy.

( For example, sw1=2,sw2=1,sw3=1,sw4=1and schema 2, 3, 4 are same. The binary strategy might treat sw1 as the most important schema, while in fact schema 2, 3 and 4 are.)

Page 45: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

45

Conclusion

In this paper, we have introduced a semantic approach to resolve structural conflicts in the integration of XML schemas.

We employed the ORA-SS semantic data model to capture the implicit semantics in an XML schema.

We presented a comprehensive n-nary algorithm to integrate XML schemas.

our algorithm takes into account the data semantics, the importance of a source, and how the majority of the sources model their data.

Structural conflicts such as attribute/object class conflict, ancestor-descendant conflict are resolved in our approach. We also remove redundant object classes and relationship sets such as transitive relationship sets, and relationship sets, which are projections of higher degree relationship sets in order to obtain a concise integrated schema.

Page 46: 1 Resolving Structural Conflicts in the Integration of XML Schemas: A Semantic Approach Xia Yang Mong Li Lee Tok Wang Ling National University of Singapore.

46

References 1. S.Castano, V. Antonellis, S. C. Vimercati, M. Melchiori. An XML-Based Framework for Information Integration over the Web. IIWAS,

2000. 2.Y.B. Chen, T.W. Ling, M.L. Lee. Designing Valid XML Views. ER, 2002. 3.P. Buneman, S. Davidson, W. Fan, C. Hara, W.C. Tan. Keys for XML. WWW, 2001. 4.A. Doan, P. Domingos, A. Levy. Learning Source Descriptions for Data Integration. WebDB, 2000. 5.G. Dobbie, X. Wu, T.W. Ling, M.L. Lee. ORA-SS: An Object-Relationship-Attribute Model for Semi-structured Data. Technical Report

TR21/00, National University of Singapore, 2000. 6.E. Rahm, P. Bernstein. On Matching Schemas Automatically. MSR Tech. Report MSR-TR-2001-17, 2001. 7.E. Jeong, C.-N. Hsu. Induction of Integrated View for XML Data with Heterogeneous DTDs. ACM CIKM, 2001. 8.T.W. Ling, M.L. Lee. Relational to Entity-Relationship Schema Translation Using Semantic and Inclusion Dependencies, in Journal of

Integrated Computer-Aided Engineering, John-Wiley Publishers, Vol 2, No 2, pages 125-145, 1995. 9.M.L. Lee, T.W. Ling. Resolving Structural Conflicts in the Integration of Entity-Relationship Schemas. OOER, 1995. 10.M.L. Lee, T.W. Ling. Resolving Constraint Conflicts in the Integration of Entity-Relationship Schemas. ER, 1997. 11.M.L. Lee, T.W. Ling, W.L. Low. Designing Functional Dependencies for XML, EDBT, 2002. 12.M.L. Lee, L.H. Yang, W. Hsu, X. Yang. XClust: Clustering XML Schemas for Effective Integration, ACM CIKM, 2002. 13.D. Maier. Theory of Relational Databases. Computer Science Press, 1983. 14.J. Madhavan, P.A. Bernstein, E. Rahm. Generic Schema Matching with Cupid. VLDB, 2001. 15.R. Mello, S. Castano, C.A. Heuser. A Method for the Unification of XML. Information and Software Technology Journal, 2002. 16.P. Mitra, G. Wiederhold and J. Jannink. Semi-automatic Integration of Knowledge Sources. Fusion, 1999. 17.P. Mitra, G. Wiederhold, M. Kersten. A Graph-Oriented Model for Articulation of Ontology Interdependencies. EDBT 2000. 18.F. Naumann, U. Leser, J.C. Freytag. Quality-driven Integration of Heterogeneous Information Systems. VLDB, 1999. 19.C. Reynaud, J.-P. Sirot, D. Vodislav. Semantic Integration of XML Heterogeneous Data Sources. IDEAS, 2001. 20.http://www.cogsci.princeton.edu/~wn 21.Xyleme. A dynamic warehouse for XML Data of the Web. IEEE Data Engineering Bulletin 24(2):40-47, 2001. 22.L.L. Yan, T.W. Ling. Translating Relational Schema with Constraints into OODB Schema. IFIP DS-5 Semantics of Interoperable

Database Systems. 1992