The Definition of GraphDB
-
Upload
takahiro-inoue -
Category
Technology
-
view
31.384 -
download
2
description
Transcript of The Definition of GraphDB
The Definition ofGraphDB
@doryokujin
GraphDB Meet-Up Japan #1
・Takahiro Inoue(age 26)
・twitter: doryokujin
・Majored in Math (Statistics & Graph Algorithm)
・Data Scientist
・Leader of MongoDB JP
・Interest: DataProcessing, GraphDB
About Me
(1) Graph Type for GraphDB~Which Graph is Better for GraphDB ?~
(2) Graph Traversals~ Graph Query ≡ Graph Traversal ~
(3) Index Free Adjacency~The Key of Definition of GraphDB~
(4) Other Topics
Agenda
(1) Graph Class for GraphDB~Which Graph is Better for GraphDB ?~
・Graph is an ordered pair G = (V, E)
・Set V of Nodes
・Set E of Edges
- 2 Element Subsets of V
- Representing “Relationship” Between Nodes
- Directed or Undirected
Definition of Graph
[Undirected Graph]
・Edges have no orientation
・Not ordered pairs, but sets {u, v} i.e. Edge (a, b) ≡ (b, a)
・All nodes have the same object type
・All edges have the same relationship
Def. Undirected Graph
G = (V, E)
[Directed Graph (Digraph)]
・Ordered pair D = (V, A)
・A: Set of ordered pairs of nodes, called “arrows”
・All nodes have the same object type
・All edges have the same relationship
Def. Directed Graph
D = (V, A)
symmetric
Example: (Un)Directed Graph
follow
follow
follow
follow
friend
friend
friend
・relationship of all edges: “friend”
・facebook friend is symmetric
・node object type: “user”
・relationship of all edges: “follow”
・twitter follow action is asymnetric
・node object type: “user”
[Facebook] [Twitter]
[Mixed Graph]
・Edges may be directed and some may be undirected
[Multigraph]
・Including (direct/indirect) loop edges and multiple edges
Def. Mixed Graph, Multigraph
G = (V, E,A)
D = (V, A)
loop
multiple
・These types of graphs can have common representation
・undirected edge --> 2 directed edges
・symmetric edge --> 2 asymmetric directed edges
・allows loop and multiple edge
Common Representation
symmetric undirected
multiple
loop
・No undirected edge・No symmetric edge
Common Representation
Def. SIngle-Relational Graph[Single-Relational Structures]
・Multigraph
・All edges must be the same relationship
・All nodes must be the same object type
・All graphs already introduced are SR-Graphs
Is this class sufficient for graph database ?
[Multi-Relational Structures]
・More flexible than single-relational structures
・All edges are directed and asymmetric
・Each edge can have a different relationship
・Each node can have a different type object
Def. Multi-Relational Graph
Example: Multi-Relational Graph
・4 types of relationships: “Reply”, “DM”, “RT”, “Block”
・Every node still have the same object type
[Twitter]
Reply
Reply DM
DM
ReplyBlock
Reply
RTRT
Example: Multi-Relational Graph
・Many types of relationship・Connection: user --> item・Connection: user <--> user
want!
follow
like!
exhibit
want!
invite
exhibitwant!
want!
bought!exhibit
follow
follow
want!
message
[Livlis]http://www.livlis.com/
Def. Property Graph
[Property Graph]
・Multi-Relational Graph
・Each node and edge has some properties
・Each property is represented by “key-value” and scheme-free
follow
since 2011/01/23
id id_Bfollow 500
follower 1000since 2011/06/01
id id_Afollow 100
follower 200since 2011/01/01
name Bfollow 10follower 20sex man
Example: Property Graph
want!
follow
like!
exhibit
want!
invite
exhibitwant!
want!
bought!exhibit
follow
follow
want!
message
[Livlis]name Afollow 100follower 200sex man
favorite 50
since 01/01/01
price $50
since 01/01/01price $50access 500wated 10liked 30
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
... ...
Def. Hyper Graph[Hyper Graph]
・Set V of Nodes
・Set E of non-empty subsets of V
・i.e. Edge can point to more than two nodes
・Every node or edge carry an arbitrary value as payload
・Property Graph ⊂ Hyper GraphH = (V, E)
Sones: manage edge types with GraphDB 2.1
・Property Graph have flexible representation
・Key features:- All edges are directed and asymmetry
- Each edge can have a different relationship
- Each node can have a different type object
- All elements have property with key-value style
・Many GraphDBs support for Property Graph Models ※ Some GraphDBs support for Hyper Graph Model
Summary
(2) Graph Traversals~ Graph Query ≡ Graph Traversal ~
Graph Query ≡ Graph Traversal
・Not an “global” search like other RDBMS or NoSQL
・But traverse over the graph from “root node”
・”Locality” is very important
Graph Traversals
Property Graph Algorithms
・To traverse a graph is to process every node in the graph exactly once
・The two most common traversal patterns are breadth-first traversal and depth-first traversal
・For each step, the traverser moves to it's adjacent vertices
・Repeat each step until specific times or full some condition
Graph Traversals
The Graph Traversal Pattern 9
1
name=Alberto Pepe
2
name=...
3
name=...
4
name=...
friend
friend
friende
out
e
friend
lab+
v
in
✏
name
Fig. 3. A single path along along the f traversal.
those edges with the label friend, then traverse to the incoming (i.e. head)vertices on those friend-labeled edges. Finally, of those vertices, return theirname property.21 A single legal path according to this function is diagrammedin Figure 3. Though not diagrammed for the sake of clarity, the traversal wouldalso go from vertex 1 to the name of vertex 2 and vertex 3. The function f
is a “higher-order” adjacency defined as the composition of explicit adjacen-cies and serves as a join of Alberto and his friend’s names.22 The remainderof this section demonstrates graph traversals in real-world problems-solvingsituations.
3.1 Traversing for Recommendation
Recommendation systems are designed to help people deal with the problemof information overload by filtering information in the system that doesn’tpertain to the person [14]. In a positive sense, recommendation systems focusa person’s attention on those resources that are likely to be most relevantto their particular situation. There is a standard dichotomy in recommenda-tion research—that of content- vs. collaborative filtering-based recommenda-tion. The prior deals with recommending resources that share characteristics(i.e. content) with a set of resources. The latter is concerned with determiningthe similarity of resources based upon the similarity of the taste of the peo-ple modeled within the system [6]. These two seemingly di↵erent techniquesto recommendation are conveniently solved using a graph database and twosimple traversal techniques [10, 5]. Figure 4 presents a toy graph data set,where there exist a set of people, resources, and features related to each otherby likes- and feature-labeled edges. This simple data set is used for theremaining examples of this subsection.
21 Note that the order of a composition is evaluated from right to left.22 This is known as a virtual edge in the graph system called DEX [9].
・Single step traversal: from element i to element j, where i, j ∈ (V ∪ E).
・Can define graph traversals of arbitrary length from single step traversal
・Querying is performed through traversals, which can perform millions of "joins" per second
Graph Traversals
The Graph Traversal Pattern
Graph Traversals
Basic Graph Traversals
・GraphDB is efficient with respects to local data analysis (Recommendation, Social Analytics, Shortest Path). They all focus on “a user”
・Locality is defined by direct referent structures
・Frame all solutions to problems as a traversal over local regions of the graph
Summary
(3) Index Free Adjacency~The Key of Definition of GraphDB~
※ GraphDB is a not only database that can model a graph structures (RDB, Document, etc...)
[definition]
・A graph database is any storage system that provides “index-free adjacency”
The Definition of GraphDB
The Graph Traversal Programming Pattern
[Important feature]
・Mini Index: Every element (node or edge) has a direct pointer to its adjacent element
・No Index lookup: we can determine which vertex is adjacent to which other vertex without looking up an index-tree
The Definition of GraphDB
Relational Data Model
column1 column2 column312345678
[Index-tree]
[Graph data in table]
Graph Databases and Endogenous Indices
createdcreated
follows
follows
created
citescites
created
cites
createdfollows
follows
follows
name=twarkoage=30
name=ahzf
name=graph_blogviews=1000
name=tenderlovegender=male
date=2007/10
name=neo4jviews=56781
page_rank=0.023
name=peterneubauer
name property index
views property index gender property index
Graph Databases Make Use of Indices
A B C
D E }}
The Graph
Index of Vertices(by id)
• There is more to the graph than the explicit graph structure.
• Indices index the vertices, by their properties (e.g. ids).
Indexing of Verticies
Graph Data
The Graph Traversal Programming Pattern
A
E
C
B
D
A B C
D E
B, C E D, E
2. Looking up the index-tree
log_2(n) time cost 4. Moving to
either B or C
1. Want to determineneighbors of A
3. Getting the adjacency list (B,C)[Index-tree] [Graph Data]
Relational Data Model
[Index-tree] [Graph Data]
Lookup cost become larger Graph growth. O(log2n)
Looking cost become very high
Relational Data ModelTakes many time for traversing
Relational Data Model
・Insert time: as the graph grows in size, the cost of a insert time become high
・lookup time: as the graph grows in size, the cost of a lookup time growth in proportional to n, O(log2n)
・memory size: as the graph grows in size, the memory size become high
Cost of Looking Up Index-tree
Graph DB Model
[Graph Data]
[Mini-Index] direct references to its adjacent
verticesB
C
D
E
F
GA B, C
D,E
F,F
G
E,F,G
G
[Constant time]: It is dependent upon the number of connected
edges
Mini-Index: Graph DB Model
[Graph Data]
The cost of a local step remains the same
・Making external indexing system to index the properties of its vertices and edges
Indexing their properties
Graph Databases and Endogenous Indices
createdcreated
follows
follows
created
citescites
created
cites
createdfollows
follows
follows
name=twarkoage=30
name=ahzf
name=graph_blogviews=1000
name=tenderlovegender=male
date=2007/10
name=neo4jviews=56781
page_rank=0.023
name=peterneubauer
name property index
views property index gender property index
The Graph Traversal Programming Pattern
・GraphDB provides “index-free adjacency”
・No looking up index-tree, each element has direct pointers
・They have a external index system for their properties (both nodes and relations)
・A very large graph can storage only single server because a traversal cost is independence of growth of graph
Summary
(4) Other Benefits of GraphDB
A Graph Database Transforms a RDBMS
← RDBMS↓ GraphDB as RDBMS
Comparing Database Models
A Graph Database Transforms a Key-Value Store
← RDBMS
↓ GraphDB as Key-Value
Comparing Database Models
A Graph Database transforms a Document DB
↑ Document DB↓ GraphDB as RDBMS
Comparing Database Models
Example of e-commerce site
Square Pegs and Round Holes in the NOSQL World
Square Pegs and Round Holes in the NOSQL World
Example of e-commerce site
Square Pegs and Round Holes in the NOSQL World
Example of e-commerce site
Recommendation!!
(1) Graph Type for GraphDB~Which Graph is Better for GraphDB ?~
(2) Graph Traversals~ Graph Query ≡ Graph Traversal ~
(3) Index Free Adjacency~The Key of Definition of GraphDB~
Did you Understand?