Title Generic schema of spatial and temporal domains: the ...
Generic Model Management A Database Infrastructure for Schema Manipulation
description
Transcript of Generic Model Management A Database Infrastructure for Schema Manipulation
© 2001 Microsoft Corp. 1
Generic Model Management
A Database Infrastructure for Schema Manipulation
Philip A. BernsteinMicrosoft Corporation
September 6, 2001
© 2001 Microsoft Corp. 2
The Problem There is 30 years of DB Research on meta dataBut we don’t have great infrastructure to offer
– Most design tools and web services store meta data in files, not DBs
– OODBMS’s are not a huge success– Most meta data driven tools use their own infrastructure
Goal: generic meta data manipulation infrastructure – Reduce the amount of programming required to build meta
data driven applications.
Proposal: Model Management– Define an algebra to manipulate meta data in large
chunks, called models and mappings.
© 2001 Microsoft Corp. 3
Outline
• Overview of Model Management
• Solutions to classical meta data problems
• Recent technical results
© 2001 Microsoft Corp. 4
Models and Mappings• Model – a complex information structure
– XML schema, SQL schema, OO interface, UML model, web site map, make script, ….
• Mapping – a transformation from one model into another– Map between two XML schemas– Map a SQL schema to an XML schema– Map data sources to a data warehouse– Map an ER diagram to a SQL schema– Map a process defn to a workflow script
© 2001 Microsoft Corp. 5
RepresentationA model is a directed graph with one root.A model is a directed graph with one root.
Emp
E#
Dept#
Name
RelationalSchema
Emp
E#
Dept#
Name
First
Last
XSDmap1
A mapping is a model each A mapping is a model each of whose nodes connects of whose nodes connects nodes of two other modelsnodes of two other models
© 2001 Microsoft Corp. 6
Model Management Algebra
• Match
• Merge
• Compose
• Select
• Diff
• Enumerate
• ApplyFunction
• Copy
• Update operations
© 2001 Microsoft Corp. 7
map = Match(M1, M2, ) • Match(M1, M2, ) returns the best mapping
between M1 and M2, w.r.t. to
map1
=
=
Emp
E#
Dept#
Name
Addr
M1M2
Emp
E#
Dept#
Name
First
Last
Phone
© 2001 Microsoft Corp. 8
M3 = Merge(M1, M2, map)• Return the union of models M1 and M2
– Use map to guide the Merge– If elements x = y in map, then collapse
them into one element
Emp
Addr Name
Emp
Name Phone
mapC
=
Emp
Name PhoneAddr
© 2001 Microsoft Corp. 9
Left Composition ( f • )Emp
Addr
Street
City
Emp
StreetCity
Emp
StAddrTown
mapA
a1
a2a3
mapB
b2b3
M1 M2 M3
Emp
Addr
Street
City
Emp
StAddrTown
mapC
c1
c2c3
mapC = mapA f• mapB
Name Nameb1
© 2001 Microsoft Corp. 10
Model Management Algebra• map = Match (M1, M2, )
• M3 = Merge (M1, M2, map)
• map3 = Compose(map1, map2)
• M2 = Select(M1, pred)
• M2 = Diff(M1, map)
• list = Enumerate(M)
• ApplyFunction(M, f )
• M2 = Copy(M1)
• Update operations
They’re generic = data model independent … well … implemented on an extended ER model with an extensibility story
© 2001 Microsoft Corp. 11
Example
rdb1rdb1
xsd1xsd1
map
1
xsd2xsd21. map2 1. 1. mapmap22= Match(xsd1, xsd2)= Match(xsd1, xsd2)
2. map
3
2. 2. mapmap33 = = mapmap11 mapmap22
rdb2rdb2
3. m
ap4
3. <3. <mapmap44, rdb2 > = Copy(, rdb2 > = Copy(mapmap33))
• Given – map1 from SQL schema rdb1 to xsd1, – xsd2, which is similar to xsd1
• Produce– a map between xsd2 and a relational schema.
4. Use ApplyFunction(map4) to map each x in Diff(xsd2,map4) into rdb2
© 2001 Microsoft Corp. 12
Theme• Classic meta data problems can be solved
using Model Management operations– Schema integration – Schema evolution – Data migration– Reverse engineering– Data reintegration (3-way merging)
• Published solutions to these problems help us produce generic implementations of model mgmt operations
© 2001 Microsoft Corp. 13
OutlineOverview of Model Management
• Solutions to classical meta data problems– Schema integration – Schema evolution– Reverse engineering– Data reintegration (3-way merging) – Data migration
• Recent technical results
© 2001 Microsoft Corp. 14
1. map
1. 1. mapmap= Match(V= Match(V11, V, V22))
Schema Integration• Given
– two view schemas, V1 and V2
• Produce – an integrated schema, S
VV11 VV22
2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)map
SS
2. 3. 3. ApplyFunction(S) // to resolve ) // to resolve conflicts in conflicts in S, , producing SS
SS
© 2001 Microsoft Corp. 15
Emp
E#
Dept#
Addr
V1 V2
E#
Dept#
Phone
FirstName
LastName
Emp
Name
1. 1. mapmap= Match(V= Match(V11, V, V22))
map
=
=
2. S2. S = Merge(V = Merge(V11, V, V22 , map) , map)
S
E#
Dept#
Addr
Phone
Emp
Name
FirstName
LastName
f
L
R
FirstName
LastName 3. Use ApplyFunction(S3. Use ApplyFunction(S)) to re- to re-solve conflicts, producing Ssolve conflicts, producing S
© 2001 Microsoft Corp. 16
Merging Knowledge Bases (Ontologies)
• Same as schema integration, but applied to ontologies
• The literature on merging ontologies focuses mostly on Match.
© 2001 Microsoft Corp. 17
Schema Evolution• Given
– mapSV from schema S to view V– a modified version S of S
• Produce– a mapping mapSV from S to V
(i.e. a view defn for V over S).
SS
VV
map
SV
SS1. mapSS
1. 1. mapmapSSS S = Match(S= Match(S, S), S)2. mapSV
2. 2. mapmapSS V V = = mapmapSS S S mapmapSVSV
3. Use ApplyFunction(V) to delete elements not derivable from S
© 2001 Microsoft Corp. 18
OutlineOverview of Model Management
• Solutions to classical meta data problemsSchema integration Schema evolution – Reverse engineering– Data reintegration (3-way merging)
– Data migration
• Recent technical results
© 2001 Microsoft Corp. 19
Reverse Engineering• Given
– Model M (e.g., an ER model)– Model G (e.g., SQL) generated via mapMG from M– A modified version G of G
• Produce– A modified version M of M that generates G
GG
MM
map
MG
GG1. mapGG
1. map1. mapGGGG = Match(G, G= Match(G, G))2. mapMG
2. map2. mapMGMG = map= mapMG MG map mapGGGG
MM3. map
MG
3. <M3. <M, map, mapGG M M > = Copy(map> = Copy(mapMGMG))
4. Use ApplyFunction(mapMG), to reverse engineer each g in Diff(G,mapMG) into M
© 2001 Microsoft Corp. 20
3-Way Merge (aka Reintegration)• Given
– a source schema S0
– two derived schemas S1 and S2
• Produce– a schema S3 that merges the changes of S1 and S2
1. MapOA = Match(O, A) (based on OIDs) 2. MapOB = Match (O, B) (based on OIDs) 3. MapOA = ApplyFunction(MapOA) such that if eMapOA if
domain(e) = range(e), then delete e (i.e. things changed in A) 4. MapOB = ApplyFunction(MapOB) such that if eMapOB if
domain(e) = range(e), then delete e (i.e. things changed in B) 5. ChangedA = range(MapOA)6. ChangedB = range(MapOB)7. MapChAChB = Match(ChangedA, ChangedB) 8. MapChBChA = invert(MapChAChB) 9. A = Diff(ChangedA, ChangedB, MapChAChB) (changed in
A but not changed in B) 10. B = Diff(ChangedB, ChangedA, MapChBChA) 11. MapAB = Match (A,B) (by OIDs) 12. G = Merge (A,B, MapAB) 13. MapGA =Match(G,A)
14. GA = Merge (G, A, MapGA) with preference for A 15. MapGAB =Match(GA,B) 16. GAB = Merge (GA’, B’, MapGA’B’) with preference for B17. DeletedA = Diff(O,A,MapOA) 18. DeletedB = Diff(O, B, MapOB) 19. MapDeletedAChangedB = Match(DeletedA, ChangedB) 20. MapDeletedBChangedA = Match(DeletedB, ChangedA) 21. ShouldDeleteA = Diff(DeletedA, ChangedB,
MapDeletedAChangedB) 22. ShouldDeleteB = Diff(DeletedB, ChangedA,
MapDeletedBChangedA) 23. MapGABSDA = Match(GAB, ShouldDeleteA) 24. GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA) 25. MapGABSDASDB = Match(GABSDA,ShouldDeleteB) 26. Final result = Diff(GABSDA, ShouldDeleteB,
MapGABSDASDB)
S0
S1 S2
S3
© 2001 Microsoft Corp. 21
Data Migration• Given
– a schema S and its database D– an evolved schema S
• Produce– a procedure for mapping D into an
S database D
SS SS D
2. Use Enum(S) to generate a data migration script
GenerateMigration
ScriptEnum
1. 1. mapmapSSSS = Match(S, S= Match(S, S))
1. mapSS
Run
D
© 2001 Microsoft Corp. 22
Data Translation
• Like data migration, except S and S are expressed in different data models.
© 2001 Microsoft Corp. 23
OutlineOverview of Model Management
Solutions to classical meta data problems
• Recent technical results
© 2001 Microsoft Corp. 24
Status Report• Vision
– [Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00]• Data Warehouse Examples
– [Bernstein & Rahm, ER ’00]• Match Operation
– Survey: [Rahm & Bernstein, MSR Tech Report]– Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01]
• Merge Operation– coming soon …
• Theory– [Alagić & Bernstein, DBPL ’01]
© 2001 Microsoft Corp. 25
Schema Matching Approaches• About a dozen published algorithms. • Many good ideas, but none are robust.
Automatic composition
Composite
Individual matchers Combined matchers
Manual composition
Schema-based Content-based
• Graph matching
Linguistic Constraint-based
StructuralPer-Element
• Types• Keys
• Value pattern and ranges
Constraint-based
Linguistic
• IR (word frequencies, key terms)
Per-Element
Hybrid
Constraint-based
• Names• Descriptions
© 2001 Microsoft Corp. 26
The CUPID Algorithm
City Street
PurchaseOrder
InvoiceToDeliverTo
City Street City Street
Address Address
POShipTo
PO
POBillTo
City Street
ssim++
• Computes linguistic similarity of element pairs• Computes structural similarity of element pairs• Generates a mapping
© 2001 Microsoft Corp. 27
M3 = Merge(M1, map, M2)• [Buneman, Davidson, Kosky, EDBT 92]
– Meta-model has aggregation & generalization only– Do a union and collapse objects having the same name– Fix-up step for inconsistencies created by merging
Y
Xa
Z
Xa
Y X Z
W
a
Y
X
Za a
– Successive fixups lead to different results – Batch them at the end, to produce a unique minimal result
• Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)
© 2001 Microsoft Corp. 28
A Formal Semantics for Model Mgt
• Use category theory for a data-model-independent characterization of models and mappings
• Models and their DBs are categories• Model and data transformations are morphisms• Mappings between models & data are functors• Utility
– Define formal semantics for Match and Merge– Explain when Match & Merge preserve constraints.– Check that implementation satisfies the semantics
© 2001 Microsoft Corp. 29
Categories
Functor
Theory
Db Db(Sch1)
Db(Sch12)Db(Sch2)
DbDb
qp
Sch12
Sch1
Sch2
fSchm
g
Match
Merge
• Goal – a mathematical semantics of MM algebra
© 2001 Microsoft Corp. 30
Implementation Vision
OR Mapper
MM Meta-Model
MatchMerge
ComposeCopy
Apply …
Model-DrivenUI Generator
ModelManager
Object-OrientedRepository
SQLDBMS
BillCustomer
UpdateMarketing
Inventory
AuthorizeCredit
OrderEntry
ScheduleDelivery
Customer
Order
ScheduledDelivery
Product
Salesperson
select all
custempdept
dnodna
Generic ToolsGeneric Tools• BrowserBrowser• Import/exportImport/export• ScriptingScripting
• EditorsEditors• CatalogsCatalogs
OperationSpeciali-zations
InferencingEngine
© 2001 Microsoft Corp. 31
Related Work• There’s a lot of it. Apply it to model management!
• Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic)
• Inferencing on mappings – AQUV, description logic
• Transitive closure and recursive QP
• Differencing – text, trees, graphs
• Data translation – algebras, schema evolution
• Data integration – schema match, view generation
© 2001 Microsoft Corp. 32
Summary• Raise the level of abstraction of meta-data
programming by using:– models and mappings as objects– an algebra that manipulates models and
mappings on a generic meta-model• Classical meta data problems can be
expressed using this algebra• Implementations of classic problems offer
guidance on implementing the algebra
© 2001 Microsoft Corp. 33
References• http://www.research.microsoft.com/~philbe
• P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference
• P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage-ment of Complex Models”, SIGMOD Record, Dec. 2000
• E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report
• J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001
• S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001