Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS...
-
date post
21-Dec-2015 -
Category
Documents
-
view
222 -
download
0
Transcript of Peer Data Management, Concluded and Model Management Zachary G. Ives University of Pennsylvania CIS...
Peer Data Management, Concludedand Model Management
Zachary G. IvesUniversity of Pennsylvania
CIS 650 – Database & Information Systems
April 18, 2005
2
Administrivia
Next readings and summaries: Dong and Halevy on Personal Info
Management
2 paragraph summary of the problems they focus on, key contributions
From Piazza to pizza … and scheduling
3
Today’s Trivia Question
4
Our Discussion
The SW as originally posed: RDF as “semantic” format
Also RDFS schema format
Ontologies as the standard way of defining concepts
Description logics are the way most ontologies are defined (OWL language)
Piazza PDMS: Relations and views Query language as mapping language Transitive closure of composition of mappings
5
Peer Data Management: Decentralized Mediation for Ad Hoc Extensibility
DB Projects
UPenn UW Stanford IIT Mumbai
Data integration: 1 mediated schema, m mappings to sources
Peer data management system (PDMS): n mediated “peer schemas,” as few as (n - 1)
mappings between them – evaluated transitively m mappings to sources
6
Example Rule-Goal Tree Expansion
q: Q(a1, a2) :- SameProject(a1,a2,p), Author(a1,w), Author(a2,w)
SameProject(a1,a2,p) Author(a1,w) Author(a2,w)
ProjMember(a1,p)ProjMember(a2,p) CoAuthor(a1,a2) CoAuthor(a2,a1)
S1(a1,p,_) S1(a2,p,_) S2(a1,a2) S2(a2,a1)
q
r0 r1 r1
r3 r3r2 r2
Q’(a1,a2) :- S1(a1,p,_), S1(a2,p,_), S2(a1,a2) S1(a1,p,_), S1(a2,p,_), S2(a2,a1)
7
RDF vs. XML
RDF explicitly names relationships:(book, title, “ABC”)(book, writtenBy, author)(author, name, “John Smith”)
XML does not always:1. <book>
<title>ABC</title> <writtenBy> <author><name>John Smith</name></author> </writtenBy></book>
2. <book> <title>ABC</title> <author>John Smith</author></book>
title name
book authorwrittenBy
8
RDF vs. XML 2
RDF is subject-neutral (a graph) XML centers around a subject (a tree):
1. <book> <title>ABC</title> <author>John Smith</author></book>
2. <author> <name>John Smith</name> <book>ABC</book></book>
This may result in duplication of contained objects
9
An XML Version of the Semantic Web
Data model: XML + Schema Vast volumes of data already in XML (or exported as XML) CAVEAT: not all relationships are labeled in XML
(“XML has no semantics.”)
Concepts: Views ≈ classes; schemas ≈ ontologies Views define membership via queries; can reason about
containment CAVEAT: less expressive than OWL classes
Schema mappings: target schema as query over sourceSophisticated reasoning about mappings is possible by extending existing data integration techniques Can use mappings in in “forward” and “reverse” directions Allows for “chaining” of mappings to answer queries
10
Piazza with XML (WWW03)
Goals: Build on XQuery and XML (extended with RDF-style identity,
following lead of [Patel-Schneider & Simeon 02]) Remain computationally inexpensive Capture the common mapping types
Directional mapping language based on templates<output> {: $var IN document(“doc”)/path WHERE condition :}
<tag>$var</tag></output>
Translates between parts of data instances Restricted subset of XQuery that’s decidable to reason about Supports special annotations and object fusion
Can map XML-XML, XML-RDF, RDF-XML (at data level)
11
Mapping Example between XML Schemas
Target:pubs
book* title
author*
name
Source:authors
author* full-
name publication*
title pub-type
pub-type name
publication authorwrittenBy
title
12
Example Piazza Mapping
<pubs><book piazza:id={$t}>{: $a IN document(“…”)/authors/author, $an IN $a/full-name, $t IN $a/publication/title, $typ IN $a/publication/pub-type WHERE $typ = “book” PROPERTY $t >= ‘A’ AND $t < ‘B’ :}
<title>{$t}</title>
<author><name>{$an}</name></author></book>
</pubs>
13
Challenges
Query reformulation for XML is significantly harder Hierarchy, 1:n schema constraints, ability to
map from values to tags, … Redundant paths Can only do ~ the XML equivalent of
conjunctive queries
See the WWW03 paper (plus later work by Yu and Popa, Deutsch et al., many others) for details
14
What about Values?
Thus far, we’ve focused on schema mappings
Almost as important in the real world: mappings of values to values Proteins to binding sites SSNs to customer IDs etc.
The Hyperion system (KAM 03) focuses on computing transitive relationships between mappings In many cases, we only have partial transitive mappings Key idea: divide all of the mappings into partitions, each
of which can compute transitive closures separately
15
Assessment: The Semantic Web
The KB world focuses on expressively capturing concepts
The DB world focuses on integrating and restructuring data (but views are less expressive in certain ways)
Do either of these seem likely to change the world?
What barriers need to be removed?
16
From Managing the Web as a Database to Managing Databases of Databases
Many common operations in: Data integration Data interchange Schema design Semantic Web Schema maintenance/evolution
For instance: Creating a mediated schema Defining mappings between schemas Seeing what’s different between schemas
The vision: let’s build a system to manage metadata, not data!
17
Metadata Management
The challenges: There are lots of metadata representations
Different data models; different definition types (e.g., Java classes, XML Schemas, SQL DDL, …)
Many of the problems are unsolvable in the abstract e.g., schema matching But maybe we can customize tools for each task And maybe we can get user input to help
We want to create a clean, composable model of operators Should be “algebraic” in some sense, with nice properties Operators need to be generic but extensible
18
Data vs. Metadata vs. …
Data We know what this is
Metadata (models) Schemas, types, classes, etc.
Metamodels Things like the relational model, O-R model, …
Bernstein focuses on managing models, with customization for each metamodel (and perhaps special domains)
19
Models
A model is a set of objects with identity Objects have at least extended ER-style
traits: attributes/properties is-a, has-a relationships loose associations
All of these are assumed to have types
20
Mappings
A mapping describes a correspondence between parts of two models; it may be annotated with information about computing the transformation
Emp
Emp#
Name
Address
Mapee
1=
2≈
Employee
EmployeeID
FirstName
LastName
Phone
21
The Basic Algebraic Operators
MatchBasically, schema matching: takes two models and
returns a mapping between themElementary vs. complex match; reliance on morphisms
ComposeTakes two mappings and composes them
DiffTakes a model A, a mapping A B, and returns the part
of A that’s not mappedModelGen
Takes model A, creates new model B plus mapping A BMerge
Takes models A, B, mapping between them, returns the union C, plus mappings A C, B C
22
Model Management in Action
23
Schematic of Changes
the new parts in S2 thatneed to be propagated to d2
Dest. w/o deleted itemsfrom s1
the XML version of s2
24
Actual Operations
25
What’s Hard?
Match We saw that LSD is far from perfect, and it’s the best
out there…
Merge Can we make (A merge B) merge C = A merge (B merge
C)? (Buneman, Davidson, Kosky 92)
With Diff, how do we ensure a well-formed model as the result? They return a copy of the model, plus mappings
showing what is actually part of the diff
Composition – it isn’t always closed within the mapping language!
26
More Challenges
What about: Semantics of the meta-model – how do we
handle, e.g., constraints? What to do about approximate
correspondences? Can we actually make these things generic but
expressive enough to be useful?
Do you think this vision is feasible?