Fosdem 2013 petra selmer flexible querying of graph data
-
Upload
petra-selmer -
Category
Technology
-
view
910 -
download
0
description
Transcript of Fosdem 2013 petra selmer flexible querying of graph data
![Page 1: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/1.jpg)
Graph processing room
FOSDEM, 2 Feb 2013
Petra Selmer
http://www.dcs.bbk.ac.uk/~lselm01/
Flexible querying of graph data
![Page 2: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/2.jpg)
Introduction
2
I shall be presenting my PhD topic which involves
a declarative query language allowing for the
flexible querying of graph-structured data with
complex paths.
![Page 3: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/3.jpg)
Agenda
3
Who (am I)?
Why (the motivation)?
Some background info
What (is the query language and what
can it do)?
Illustrative examples
How (is it done)?
![Page 4: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/4.jpg)
Who?
4
Petra Selmer
Part-time PhD student:
Birkbeck College, University of London
Prof. Alexandra Poulovassilis
Dr. Peter T. Wood
Software Architect:
University College London’s Institute of Neurology
(Wellcome Trust Centre for Neuroimaging)
![Page 5: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/5.jpg)
Why?
5
Amount of graph-structured data is growing fast
The structure of this data is becoming more complex, especially when multiple, heterogeneous data sources are integrated together
The structure of the data is also always subject to change...
![Page 6: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/6.jpg)
Why?
6
Users of such systems may not be familiar with the underlying data
structure: available paths etc
The user may not be able to obtain meaningful answers (or indeed,
any answers) from the data IF the querying system is limited to exact
matching of users’ queries
Also, the user may wish to explore the data by starting from a set of
initial answers and proceeding from there
The user may additionally wish to derive some intelligence from the
connections....
The user
The data
The query
![Page 7: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/7.jpg)
Background: Ontologies
7
Currently part of the Semantic Web stack (Tim Berners-
Lee, RDF, triple stores)
Models a domain of interest: inferences, reasoning...
It can be thought of as a “schema” for graph data
The following inference rules are included (among
others):
Subclass: ‘History’, ‘Languages’ are subclasses of
‘Humanities’
Subproperty, Domain, Range...
![Page 8: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/8.jpg)
What?
8
Data model: G = (V, E) Very general model V : vertices (or nodes); each labelled with some
constant E : directed, labelled edges; labels drawn from an
alphabet {Ʃ U ‘type’}
The query language is called Flex-It (it is declarative)
The basis is that of conjunctive regular path
queries There are two operators which may be applied to the
original query
![Page 9: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/9.jpg)
What?
9
Conjunctive regular path queries:
This is where the graph's paths to be traversed are expressed with a
regular expression
A single regular path query conjunct: (X, R, Y)
X, Y: either constants or variables
R: the regular expression
“Conjunctive”: joining multiple conjuncts; e.g. (X, R1, Y), (Y,
R2, Z), (Z, R3, A)
The Y’s are matched, the Z’s are matched etc
N1 N2 N3 N4 n n p
1) (N1, n+, ?Y):
• Y = N2, N3
2) (N1, n*p, ?Y):
• Y = N4
![Page 10: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/10.jpg)
What?
10
Approximation allows for the approximate matching
of labels in the path
An edit operation is applied to each edge label in
the path denoted by the regular expression:
Edit operations: insertions, deletions, inversions,
substitutions and transpositions of labels
Each operation has a ‘cost’: usually 1
Example: Query conjunct: (X, a*.b, Y)
R = a*.b [answers returned at cost 0]
R’ = p.a*.b (insertion of ‘p’) [answers returned at cost 1]
R’’ = p.a*.b- (inversion of ‘b’) [answers returned at cost 2]
![Page 11: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/11.jpg)
What?
11
Relaxation is applied by using inference rules from an ontology (if one exists). Achieved by applying logical relaxation of the query
conditions using the data’s ontology definition Relaxation operations: subclass, subproperty, domain
and range Each operation has a ‘cost’ – usually 1
Example: We have an ontology: Humanities (superclass) Languages and History (subclasses of Humanities)
Assume our query states Languages may be relaxed Languages is relaxed to Humanities: Instances of Languages will be returned at cost 0 Instances of History will be returned at cost 1
![Page 12: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/12.jpg)
What?
12
Answers are ranked according to how
closely they match the original query;
higher-cost answers have a lower ranking
All answers at a certain distance d are
ranked the same and returned before
answers at a higher distance
We allow for incremental execution: exact
answers returned first; then answers at
distance 1; ...
![Page 13: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/13.jpg)
Example – ‘Lifelong learner metadata’
13
History
sc
![Page 14: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/14.jpg)
14
History
sc
![Page 15: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/15.jpg)
15
Query: “What work positions can I reach, having a degree in English”?
Y = the episode; Z = the job
(?Y, ?Z)
(?X, type, University),
(?X, qualif.type, EnglishStudies),
(?X, prereq+, ?Y),
(?Y, type, Work),
(?Y, job.type, ?Z)
![Page 16: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/16.jpg)
16
Query: “What work positions can I reach, having a degree in English”?
Y = the episode; Z = the job
(?Y, ?Z)
(?X, type, University),
(?X, qualif.type, EnglishStudies),
(?X, prereq+, ?Y),
(?Y, type, Work),
(?Y, job.type, ?Z)
No results from User 2 will be returned...even though it is relevant!
![Page 17: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/17.jpg)
17
Allowing query approximation can yield some answers:
Replacing the edge label prereq by next, at an edit cost of 1, we get this variant of the
query:
(?Y, ?Z)
(?X, type, University),
(?X, qualif.type, EnglishStudies),
APPROX(?X, prereq+, ?Y),
(?Y, type, Work),
(?Y, job.type, ?Z)
prereq+ can be approximated by next.prereq* at edit distance 1:
Result: Y = ep22, Z = AirTravelAssistant
![Page 18: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/18.jpg)
18
Allowing query approximation can yield some answers:
Replacing the edge label prereq by next, at an edit cost of 1, we get this variant of the query: (?Y, ?Z)
(?X, type, University),
(?X, qualif.type, EnglishStudies),
APPROX(?X, prereq+, ?Y),
(?Y, type, Work),
(?Y, job.type, ?Z)
next.prereq* can be approximated by next.next.prereq*, now at edit distance 2: Results:
Y = ep23, Z = Journalist
Y = ep24, Z = AssistantEditor
![Page 19: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/19.jpg)
19
History
sc
![Page 20: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/20.jpg)
20
Query: “What jobs are open to me if I study English, or something similar, at University”?
(?Y, ?Z)
(?X, type, University), (?X, qualif, ?D),
RELAX (?D, type, EnglishStudies),
APPROX (?X, prereq+, ?Y),
(?Y, type, Work), (?Y, job.type, ?Z)
In addition to the answers (from User 2) obtained by the previous query, we now also have
answers from the timeline of User 3
prereq+ can be approximated by next.prereq* (distance 1) and EnglishStudies can be relaxed
– via Languages - to Humanities (distance 2), encompassing History
Result: Y = ep32, Z = PersonalAssistant (distance of 3 from original query)
![Page 21: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/21.jpg)
21
Query: “What jobs are open to me if I study English, or something similar, at University”?
(?Y, ?Z)
(?X, type, University), (?X, qualif, ?D),
RELAX (?D, type, EnglishStudies),
APPROX (?X, prereq+, ?Y),
(?Y, type, Work), (?Y, job.type, ?Z)
next.prereq* can be approximated by next.next.prereq* (distance 2), with EnglishStudies again relaxed to Humanities (distance 2)
Results: (both at distance 4 from the original query)
Y = ep33, Z = Author
Y = e34, Z = AssociateEditor
![Page 22: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/22.jpg)
How?
22
Theory
Construction of a weighted non-deterministic finite
automaton (NFA) to represent the regular expression
We apply new states and transitions to the NFA to represent the
approximation and relaxation operations
Formation of a product automaton: NFA with data
graph G
We perform a lowest cost path traversal of the product
automaton; construct query tree, do joins etc
Polynomial time complexity
Correctness of algorithms proven
![Page 23: Fosdem 2013 petra selmer flexible querying of graph data](https://reader033.fdocuments.net/reader033/viewer/2022051817/5482ef6eb47959d30c8b4924/html5/thumbnails/23.jpg)
How?
23
Implementation of prototype
Graph database: DEX (http://www.sparsity-
technologies.com/dex)
Programming language: C#
Further work
New flexible operation combining APPROX and
RELAX FLEX
Optimisation!