Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

15
1 PalGov © 2011 فلسطينيةلكترونية الديمية الحكومة ا أكاThe Palestinian eGovernment Academy www.egovacademy.ps Tutorial II: Data Integration and Open Information Systems Session 12.2 Architectural Solutions for the Integration Issues Dr. Mustafa Jarrar University of Birzeit [email protected] www.jarrar.info

Transcript of Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

Page 1: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

1PalGov © 2011

أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy

www.egovacademy.ps

Tutorial II: Data Integration and Open Information Systems

Session 12.2

Architectural Solutions for the Integration Issues

Dr. Mustafa Jarrar

University of Birzeit

[email protected]

www.jarrar.info

Page 2: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

2PalGov © 2011

About

This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the

Commission of the European Communities, grant agreement 511159-TEMPUS-1-

2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps

University of Trento, Italy

University of Namur, Belgium

Vrije Universiteit Brussel, Belgium

TrueTrust, UK

Birzeit University, Palestine

(Coordinator )

Palestine Polytechnic University, Palestine

Palestine Technical University, PalestineUniversité de Savoie, France

Ministry of Local Government, Palestine

Ministry of Telecom and IT, Palestine

Ministry of Interior, Palestine

Project Consortium:

Coordinator:

Dr. Mustafa Jarrar

Birzeit University, P.O.Box 14- Birzeit, Palestine

Telfax:+972 2 2982935 [email protected]

Page 3: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

3PalGov © 2011

© Copyright Notes

Everyone is encouraged to use this material, or part of it, but should

properly cite the project (logo and website), and the author of that part.

No part of this tutorial may be reproduced or modified in any form or by

any means, without prior written permission from the project, who have

the full copyrights on the material.

Attribution-NonCommercial-ShareAlike

CC-BY-NC-SA

This license lets others remix, tweak, and build upon your work non-

commercially, as long as they credit you and license their new creations

under the identical terms.

Page 4: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

PalGov © 2011 4

Tutorial Map

Topic h

Session 1: XML Basics and Namespaces 3

Session 2: XML DTD’s 3

Session 3: XML Schemas 3

Session 4: Lab-XML Schemas 3

Session 5: RDF and RDFs 3

Session 6: Lab-RDF and RDFs 3

Session 7: OWL (Ontology Web Language) 3

Session 8: Lab-OWL 3

Session 9: Lab-RDF Stores -Challenges and Solutions 3

Session 10: Lab-SPARQL 3

Session 11: Lab-Oracle Semantic Technology 3

Session 12_1: The problem of Data Integration 1.5

Session 12_2: Architectural Solutions for the Integration Issues 1.5

Session 13_1: Data Schema Integration 1

Session 13_2: GAV and LAV Integration 1

Session 13_3: Data Integration and Fusion using RDF 1

Session 14: Lab-Data Integration and Fusion using RDF 3

Session 15_1: Data Web and Linked Data 1.5

Session 15_2: RDFa 1.5

Session 16: Lab-RDFa 3

Intended Learning Objectives

A: Knowledge and Understanding

2a1: Describe tree and graph data models.

2a2: Understand the notation of XML, RDF, RDFS, and OWL.

2a3: Demonstrate knowledge about querying techniques for data

models as SPARQL and XPath.

2a4: Explain the concepts of identity management and Linked data.

2a5: Demonstrate knowledge about Integration &fusion of

heterogeneous data.

B: Intellectual Skills

2b1: Represent data using tree and graph data models (XML &

RDF).

2b2: Describe data semantics using RDFS and OWL.

2b3: Manage and query data represented in RDF, XML, OWL.

2b4: Integrate and fuse heterogeneous data.

C: Professional and Practical Skills

2c1: Using Oracle Semantic Technology and/or Virtuoso to store

and query RDF stores.

D: General and Transferable Skills2d1: Working with team.

2d2: Presenting and defending ideas.

2d3: Use of creativity and innovation in problem solving.

2d4: Develop communication skills and logical reasoning abilities.

Page 5: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

5PalGov © 2011

Module ILOs

After completing this module students will be able to:

- Explain different architectural solutions to the problem of data

integration.

Page 6: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

6PalGov © 2011

Architectural Solutions for the Integration

Issues

• Two families of solutions for the integration issue:

– Application-driven Integration

• Various types of middleware (e.g. Web Services, Remote

Procedure Call (RPC), Publish & Subscribe) that achieve

reconciliation through application to middleware communication

– Data-driven Integration

• Various types of data reconciliation and integration

– Consolidation

– Data Warehouse

– Data Integration

Page 7: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

7PalGov © 2011

Architectures of application-driven

Integration

e.g., Service Oriented Architecture

. . . . . .MSG-1

ASSS

ASSS

ASSS

ASSS

ASSS

ASSS. . .

LegendSS = Security ServerAS = Adapter ServerMSG = Data Message

MSG-N

Page 8: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

8PalGov © 2011

Source 1Source 2

Source nApplication 1 Application 2 Application n

Middleware

1

2

347

5

6

Update of an object O

PublishesSubscribes

Architectures of application-driven

Integration

e.g., Publish-Subscribe ArchitectureTypical application-driven integration architecture for integration of updates.

Source: Carlo Batini

Page 9: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

9PalGov © 2011

Information Integration Architectures

Source 1

Source 2

Source n

…..

Source 2

Source 1

Source n

Unique DB

New architecture

once for all

Consolidation

Source: Carlo Batini

Page 10: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

10PalGov © 2011

Information Integration Architectures

Source 1

Source 2

Source n

…..

Unique DB

New architecture:

periodically updated

Data Warehouse

middleware

New data base

Data Warehouse

Source: Carlo Batini

Page 11: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

11PalGov © 2011

Information Integration Architectures

Virtual Data Integration

Source 1

Source 2

Source n

…..

Mediator

Local

schema

Local

schema

Local

schema

Local

schemaLocal

schemaLocal

schema

Global

schema

New architectureNo new data base!

Source: Carlo Batini

Page 12: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

12PalGov © 2011

The integration problem…

Source 2

Source 1Registry

of clients 1

Source 3

Source 4

Source n

…..

Which kind of

integration?

New

architecture

Registry

of clients 2

Retail

sales

On line

sales

Other

How to decide?

Source: Carlo Batini

Additional Reading

Page 13: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

13PalGov © 2011

Criteria to be adopted

• autonomy, the degree of independence between the different data

base administrators in their design choices;

• relevance of historical data, and consequent need to periodically store

new data without deleting the old ones;

• query complexity, in terms of amount of data and tables visited and

number of operators on them, and consequent time complexity in

query execution;

• relevance of currency in queries, the need for queries to extract current

data;

• economic value of integration, the relevance of having integrated

information in input for business operational and decisional processes

in order to produce effective outputs;

Source: Carlo Batini

Additional Reading

Page 14: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

14PalGov © 2011

Criteria to be adopted

• volatility of sources, frequency of adding or deleting sources, and

frequency of change of source schemas;

• relevance of queries w.r.t transactions, relative importance and

frequency of queries with respect to changes in data;

• management complexity, the effort to be spent in management

activities related to databases and hw-sw infrastructures, due to the

corresponding complexity of the organizations using the data bases;

• costs of heterogeneity, hidden and explicit costs related to business

processes that are due to making use of heterogeneous data.

Source: Carlo Batini

Additional Reading

Page 15: Pal gov.tutorial2.session12 2.architectural solutions for the integration issues

15PalGov © 2011

References

• Carlo Batini: Course on Data Integration. BZU IT Summer School

2011.

• Stefano Spaccapietra: Information Integration. Presentation at the IFIP

Academy. Porto Alegre. 2005.

• Chris Bizer: The Emerging Web of Linked Data. Presentation at SRI

International, Artificial Intelligence Center. Menlo Park, USA. 2009.