XML-to-SQL Query Mapping in the Presence of Multi-valued Schema Mappings and Recursive XML Schemas
Schema mappings are logical assertions that describe the correspondence between two schemas
description
Transcript of Schema mappings are logical assertions that describe the correspondence between two schemas
Schema mappings are logical assertions that describe the correspondence between two schemas
Higher-level, declarative programming constructs Hide implementation details, allow for optimizations Key elements in data exchange and data integration systems
Data Exchange [FKMP03] Translate data conforming to a source schema S into data
conforming to a target schema T so that the schema mapping M is satisfied
SPIDER: a Schema Mapping DebuggerBogdan Alexe Laura Chiticariu Wang-Chiew Tan
University of California, Santa Cruz
Schema Mappings and Data Exchange
Example Main Idea: Debugging Schema Mappings with Routes
Compute All Routes and Compute One Route
Sourceinstance I
Sourceschema S
TargetSchema T
Targetinstance J
Debugging a Data Exchange
The process of exploring, understanding and refining a schema mapping through the use of (test) data at the level of schema mappings
Debugging Schema Mappings
M
XQuery/SQL/Java
Approach 1: At the level of the implementation (unsatisfactory) Specific to the exchange engine Specific to implementation language. E.g., XQuery,
XSLT, etc Commercial tools available: Altova MapForce, Stylus
Studio, etc
Approach 2: At the level of the schema mapping (desirable) Currently, NO SUPPORT!!!
Motivation for debugging at the level of the schema mappings: Uniformity in specifying and debugging Reduce programming effort by allowing a user to
specify and debug at the level of schema mappings
MANHATTAN CREDITCardHolders: cardNo ² limit ² ssn ² name ²
Dependents: accNo ² ssn ² name ²
FARGO FINANCEAccounts:² accNo² creditLine² accHolder
Clients:² ssn² name
D2
D1
C1
Source instance I Target instance J
Solution for I underthe schema mapping
123 $15K ID1 Alice
456 $7K ID2 Bob
CardHolders
123 ID2 Bob
Dependents
123 L1 ID1
A2 L2 ID2
456 L3 ID2
Accounts
ID1 Alice
ID2 Bob
Clients
fk1
D1: foreach s0 in MANHATTAN-CREDIT.CardHolders exists t0 in FARGO-FINANCE.Accounts, t1 in FARGO-FINANCE.Clients, where t0.accHolder=t1.ssn with s0.cardNo= t0.accNo and s0.ssn= t0.accHolder and s0.name= t1.name
D2: foreach s0 in MANHATTAN-CREDIT.Dependents exists t0 in FARGO-FINANCE.Clients with s0.ssn= t0.ssn and s0.name= t0.name
C1: foreach s0 in FARGO-FINANCE.Clients exists t0 in FARGO-FINANCE.Accounts with s0.ssn= t0.accHolder
FeaturesComputing routes for selected source or target data
Compute all routesCompute one routeCompute alternative routes on demandGuided exploration of all routes
Standard debugging features Breakpoints on dependenciesWatch windows – zoom into details about each step in the routes
Schema-level exploration of routesFacilitates the understanding of schema mappings directly at the level of source and target schemas
Implementation detailsOn top of the Clio data exchange systemSupports relational and XML schema mappingsSchema mapping language: XSML - XML Schema Mapping Language
Source schema Target schema
Source-to-target dependencies
Target dependencies
Unknown credit limit?
$15K is not copied over to the target
123 L1 ID1
Accounts
BobID2$7K456
CardHolders
ID1L1123
Accounts
AliceID1
ClientsD1
A route for the Accounts tuple
Debugging scenario 1
D’1: foreach s0 in MANHATTAN-CREDIT.CardHolders exists t0 in FARGO-FINANCE.Accounts, t1 in FARGO-FINANCE.Clients, where t0.accHolder=t1.ssn with s0.cardNo= t0.accNo and s0.ssn= t0.accHolder and s0.name= t1.name and s0.limit= t0.creditLine
123 is not copied to the target as Bob’s account number
D2BobID2123
Dependents
ID2L2A2
Accounts
BobID2
Clients C1
Route for the Accounts tuple
Unknown account number?
Debugging scenario 2
D’2: foreach s0 in MANHATTAN-CREDIT.Dependents, s1 in MANHATTAN-CREDIT.CardHolders where s0.accNo=s1.cardNo exists t0 in FARGO-FINANCE.Clients, t1 in FARGO-FINANCE.Accounts
where t1.accHolder=t0.ssn with s0.ssn= t0.ssn and s0.name= t0.name and s1.cardNo= t1.accNo and s1.limit= t1.creditLine
BobID2
Clients
BobID2123
Dependents
ID2L2A2
Accounts
AliceID1$15K123
CardHolders
D1 D2
C1
Forest of routes for the Account tuple Routes obtained from the forest
D1
BobID2
Clients
C1
D2BobID2123
Dependents
ID2L2A2
Accounts
BobID2
Clients C1
BobID2$7K456
CardHolders
ID2L2A2
Accounts
ID2L3456
Accounts
Schema-level exploration of routes
MANHATTAN CREDITCardHolders: cardNo ² limit ² ssn ² name ²
Dependents: accNo ² ssn ² name ²
FARGO FINANCEAccounts:² accNo² creditLine² accHolder
Clients:² ssn² name
C1
fk1D1
D2
selected schema element
Towards a full-fledged debugger
SPIDER is the first prototype debugger for schema mappings
Routes illustrate the relationship between source and target data with the schema mapping
Declarative semantics, based on the logical satisfaction of the dependencies Independent of any implementation of the schema mappingConcept applies to any mapping-based data exchange or data integration system
Compute all routesFor each selected target tuple ts, consider every possibility for witnessing t. Do not consider the same tuple twice.Complete, polynomial time algorithmThe route forest is a polynomial representation of all routes (possibly exponentially many) for the selected tuplesComputation can be user-guided, or stopped with breakpoints on dependencies
Compute one routeNon-exhaustive: adapted compute all routes to stop when one witness is foundInference procedure: to deduce all consequences of a proven tuple and avoid recomputation of “branches”Complete, polynomial time algorithm
ID2L2A2
Accounts
Sourceinstance I
Sourceschema S
TargetSchema T
Targetinstance J
M
SPIDER
routes
tuple se
lectiontuple selection
Illustrate the schema mapping at the level of the source and target schemas
Future workExtension to handle nested schema mappingsAdapt the target instance with changes in the schema mapping
AcknowledgementsDaniel Pepper, UC Santa CruzThe Clio team in IBM Almaden Research Center