Schema mappings are logical assertions that describe the correspondence between two schemas

1
Schema mappings are logical assertions that describe the correspondence between two schemas Higher-level, declarative programming constructs Hide implementation details, allow for optimizations Key elements in data exchange and data integration systems Data Exchange [FKMP03] Translate data conforming to a source schema S into data conforming to a target schema T so that the schema mapping M is satisfied SPIDER: a Schema Mapping Debugger Bogdan Alexe Laura Chiticariu Wang-Chiew Tan University of California, Santa Cruz Schema Mappings and Data Exchange Example Main Idea: Debugging Schema Mappings with Routes Compute All Routes and Compute One Route Source instance I Source schema S Target Schema T Target instance J Debugging a Data Exchange The process of exploring, understanding and refining a schema mapping through the use of (test) data at the level of schema mappings Debugging Schema Mappings M XQuery/SQL/Java Approach 1: At the level of the implementation (unsatisfactory) Specific to the exchange engine Specific to implementation language. E.g., XQuery, XSLT, etc Commercial tools available: Altova MapForce, Stylus Studio, etc Approach 2: At the level of the schema mapping (desirable) Currently, NO SUPPORT!!! Motivation for debugging at the level of the schema mappings: Uniformity in specifying and debugging Reduce programming effort by allowing a user to specify and debug at the level of schema mappings MANHATTAN CREDIT CardHolders: cardNo ² limit ² ssn ² name ² Dependents: accNo ² ssn ² name ² FARGO FINANCE Accounts: ² accNo ² creditLine ² accHolder Clients: ² ssn ² name D 2 D 1 C 1 Source instance I Target instance J Solution for I under the schema mapping 123 $15K ID 1 Ali ce 456 $7K ID 2 Bob CardHolders 123 ID2 Bob Dependents 123 L 1 ID 1 A 2 L 2 ID 2 456 L 3 ID 2 Accounts ID1 Alic e ID2 Bob Clients fk 1 D 1 : foreach s 0 in MANHATTAN-CREDIT.CardHolders exists t 0 in FARGO-FINANCE.Accounts, t 1 in FARGO- FINANCE.Clients, where t 0 .accHolder=t 1 .ssn with s 0 .cardNo= t 0 .accNo and s 0 .ssn= t 0 .accHolder and s 0 .name= t 1 .name D 2 : foreach s 0 in MANHATTAN-CREDIT.Dependents exists t 0 in FARGO-FINANCE.Clients with s 0 .ssn= t 0 .ssn and s 0 .name= t 0 .name C 1 : foreach s 0 in FARGO-FINANCE.Clients exists t 0 in FARGO-FINANCE.Accounts with s 0 .ssn= t 0 .accHolder Features Computing routes for selected source or target data Compute all routes Compute one route Compute alternative routes on demand Guided exploration of all routes Standard debugging features Breakpoints on dependencies Watch windows – zoom into details about each step in the routes Schema-level exploration of routes Facilitates the understanding of schema mappings directly at the level of source and target schemas Implementation details On top of the Clio data exchange system Supports relational and XML schema mappings Schema mapping language: XSML - XML Schema Mapping Language Source schema Target schema Source-to-target dependencies Target dependencies Unknown credit limit? $15K is not copied over to the target 123 L 1 ID 1 Accounts Bob ID2 $7K 456 CardHolders ID1 L 1 123 Accounts Alice ID1 Clients D 1 A route for the Accounts tuple Debugging scenario 1 D’ 1 : foreach s 0 in MANHATTAN-CREDIT.CardHolders exists t 0 in FARGO-FINANCE.Accounts, t 1 in FARGO- FINANCE.Clients, where t 0 .accHolder=t 1 .ssn with s 0 .cardNo= t 0 .accNo and s 0 .ssn= t 0 .accHolder and s 0 .name= t 1 .name and s 0 .limit= t 0 .creditLine 123 is not copied to the target as Bob’s account number D 2 Bob ID2 123 Dependents ID2 L 2 A 2 Accounts Bob ID2 Clients C 1 Route for the Accounts tuple Unknown account number? Debugging scenario 2 D’ 2 : foreach s 0 in MANHATTAN-CREDIT.Dependents, s 1 in MANHATTAN-CREDIT.CardHolders where s 0 .accNo=s 1 .cardNo exists t 0 in FARGO-FINANCE.Clients, t 1 in FARGO- FINANCE.Accounts where t 1 .accHolder=t 0 .ssn with s 0 .ssn= t 0 .ssn and s 0 .name= t 0 .name and s 1 .cardNo= t 1 .accNo and s 1 .limit= t 1 .creditLine Bob ID2 Clients Bob ID2 123 Dependents ID2 L 2 A 2 Accounts Alice ID1 $15K 123 CardHolders D 1 D 2 C 1 Forest of routes for the Account tuple Routes obtained from the forest D 1 Bob ID2 Clients C 1 D 2 Bob ID2 123 Dependents ID2 L 2 A 2 Accounts Bob ID2 Clients C 1 Bob ID2 $7K 456 CardHolders ID2 L 2 A 2 Accounts ID2 L 3 456 Accounts Schema-level exploration of routes MANHATTAN CREDIT CardHolders: cardNo ² limit ² ssn ² name ² Dependents: accNo ² ssn ² name ² FARGO FINANCE Accounts: ² accNo ² creditLine ² accHolder Clients: ² ssn ² name C 1 fk 1 D 1 D 2 selected schema element Towards a full-fledged debugger SPIDER is the first prototype debugger for schema mappings Routes illustrate the relationship between source and target data with the schema mapping Declarative semantics, based on the logical satisfaction of the dependencies Independent of any implementation of the schema mapping Concept applies to any mapping-based data exchange or data integration system Compute all routes For each selected target tuple t s , consider every possibility for witnessing t. Do not consider the same tuple twice. Complete, polynomial time algorithm The route forest is a polynomial representation of all routes (possibly exponentially many) for the selected tuples Computation can be user-guided, or stopped with breakpoints on dependencies Compute one route Non-exhaustive: adapted compute all routes to stop when one witness is found Inference procedure: to deduce all consequences of a proven tuple and avoid recomputation of “branches” Complete, polynomial time algorithm ID2 L 2 A 2 Accounts Source instance I Source schema S Target Schema T Target instance J M SPIDER routes tuple selection tu ple selection Illustrate the schema mapping at the level of the source and target schemas Future work Extension to handle nested schema mappings Adapt the target instance with changes in the schema mapping Acknowledgements Daniel Pepper, UC Santa Cruz The Clio team in IBM Almaden Research Center

description

Route for the Accounts tuple. Dependents. Accounts. Clients. D 2. C 1. 123. ID2. Bob. ID2. Bob. A 2. L 2. ID2. SPIDER: a S chema Map pi ng D e bugg er. Bogdan Alexe Laura Chiticariu Wang-Chiew Tan University of California, Santa Cruz. Debugging a Data Exchange. - PowerPoint PPT Presentation

Transcript of Schema mappings are logical assertions that describe the correspondence between two schemas

Page 1: Schema mappings  are logical assertions that    describe the correspondence between two schemas

Schema mappings are logical assertions that describe the correspondence between two schemas

Higher-level, declarative programming constructs Hide implementation details, allow for optimizations Key elements in data exchange and data integration systems

Data Exchange [FKMP03] Translate data conforming to a source schema S into data

conforming to a target schema T so that the schema mapping M is satisfied

SPIDER: a Schema Mapping DebuggerBogdan Alexe Laura Chiticariu Wang-Chiew Tan

University of California, Santa Cruz

Schema Mappings and Data Exchange

Example Main Idea: Debugging Schema Mappings with Routes

Compute All Routes and Compute One Route

Sourceinstance I

Sourceschema S

TargetSchema T

Targetinstance J

Debugging a Data Exchange

The process of exploring, understanding and refining a schema mapping through the use of (test) data at the level of schema mappings

Debugging Schema Mappings

M

XQuery/SQL/Java

Approach 1: At the level of the implementation (unsatisfactory) Specific to the exchange engine Specific to implementation language. E.g., XQuery,

XSLT, etc Commercial tools available: Altova MapForce, Stylus

Studio, etc

Approach 2: At the level of the schema mapping (desirable) Currently, NO SUPPORT!!!

Motivation for debugging at the level of the schema mappings: Uniformity in specifying and debugging Reduce programming effort by allowing a user to

specify and debug at the level of schema mappings

MANHATTAN CREDITCardHolders: cardNo ² limit ² ssn ² name ²

Dependents: accNo ² ssn ² name ²

FARGO FINANCEAccounts:² accNo² creditLine² accHolder

Clients:² ssn² name

D2

D1

C1

Source instance I Target instance J

Solution for I underthe schema mapping

123 $15K ID1 Alice

456 $7K ID2 Bob

CardHolders

123 ID2 Bob

Dependents

123 L1 ID1

A2 L2 ID2

456 L3 ID2

Accounts

ID1 Alice

ID2 Bob

Clients

fk1

D1: foreach s0 in MANHATTAN-CREDIT.CardHolders exists t0 in FARGO-FINANCE.Accounts, t1 in FARGO-FINANCE.Clients, where t0.accHolder=t1.ssn with s0.cardNo= t0.accNo and s0.ssn= t0.accHolder and s0.name= t1.name

D2: foreach s0 in MANHATTAN-CREDIT.Dependents exists t0 in FARGO-FINANCE.Clients with s0.ssn= t0.ssn and s0.name= t0.name

C1: foreach s0 in FARGO-FINANCE.Clients exists t0 in FARGO-FINANCE.Accounts with s0.ssn= t0.accHolder

FeaturesComputing routes for selected source or target data

Compute all routesCompute one routeCompute alternative routes on demandGuided exploration of all routes

Standard debugging features Breakpoints on dependenciesWatch windows – zoom into details about each step in the routes

Schema-level exploration of routesFacilitates the understanding of schema mappings directly at the level of source and target schemas

Implementation detailsOn top of the Clio data exchange systemSupports relational and XML schema mappingsSchema mapping language: XSML - XML Schema Mapping Language

Source schema Target schema

Source-to-target dependencies

Target dependencies

Unknown credit limit?

$15K is not copied over to the target

123 L1 ID1

Accounts

BobID2$7K456

CardHolders

ID1L1123

Accounts

AliceID1

ClientsD1

A route for the Accounts tuple

Debugging scenario 1

D’1: foreach s0 in MANHATTAN-CREDIT.CardHolders exists t0 in FARGO-FINANCE.Accounts, t1 in FARGO-FINANCE.Clients, where t0.accHolder=t1.ssn with s0.cardNo= t0.accNo and s0.ssn= t0.accHolder and s0.name= t1.name and s0.limit= t0.creditLine

123 is not copied to the target as Bob’s account number

D2BobID2123

Dependents

ID2L2A2

Accounts

BobID2

Clients C1

Route for the Accounts tuple

Unknown account number?

Debugging scenario 2

D’2: foreach s0 in MANHATTAN-CREDIT.Dependents, s1 in MANHATTAN-CREDIT.CardHolders where s0.accNo=s1.cardNo exists t0 in FARGO-FINANCE.Clients, t1 in FARGO-FINANCE.Accounts

where t1.accHolder=t0.ssn with s0.ssn= t0.ssn and s0.name= t0.name and s1.cardNo= t1.accNo and s1.limit= t1.creditLine

BobID2

Clients

BobID2123

Dependents

ID2L2A2

Accounts

AliceID1$15K123

CardHolders

D1 D2

C1

Forest of routes for the Account tuple Routes obtained from the forest

D1

BobID2

Clients

C1

D2BobID2123

Dependents

ID2L2A2

Accounts

BobID2

Clients C1

BobID2$7K456

CardHolders

ID2L2A2

Accounts

ID2L3456

Accounts

Schema-level exploration of routes

MANHATTAN CREDITCardHolders: cardNo ² limit ² ssn ² name ²

Dependents: accNo ² ssn ² name ²

FARGO FINANCEAccounts:² accNo² creditLine² accHolder

Clients:² ssn² name

C1

fk1D1

D2

selected schema element

Towards a full-fledged debugger

SPIDER is the first prototype debugger for schema mappings

Routes illustrate the relationship between source and target data with the schema mapping

Declarative semantics, based on the logical satisfaction of the dependencies Independent of any implementation of the schema mappingConcept applies to any mapping-based data exchange or data integration system

Compute all routesFor each selected target tuple ts, consider every possibility for witnessing t. Do not consider the same tuple twice.Complete, polynomial time algorithmThe route forest is a polynomial representation of all routes (possibly exponentially many) for the selected tuplesComputation can be user-guided, or stopped with breakpoints on dependencies

Compute one routeNon-exhaustive: adapted compute all routes to stop when one witness is foundInference procedure: to deduce all consequences of a proven tuple and avoid recomputation of “branches”Complete, polynomial time algorithm

ID2L2A2

Accounts

Sourceinstance I

Sourceschema S

TargetSchema T

Targetinstance J

M

SPIDER

routes

tuple se

lectiontuple selection

Illustrate the schema mapping at the level of the source and target schemas

Future workExtension to handle nested schema mappingsAdapt the target instance with changes in the schema mapping

AcknowledgementsDaniel Pepper, UC Santa CruzThe Clio team in IBM Almaden Research Center