The Definition of GraphDB

45
The Definition of GraphDB @doryokujin GraphDB Meet-Up Japan #1

description

 

Transcript of The Definition of GraphDB

Page 1: The Definition of GraphDB

The Definition ofGraphDB

@doryokujin

GraphDB Meet-Up Japan #1

Page 2: The Definition of GraphDB

・Takahiro Inoue(age 26)

・twitter: doryokujin

・Majored in Math (Statistics & Graph Algorithm)

・Data Scientist

・Leader of MongoDB JP

・Interest: DataProcessing, GraphDB

About Me

Page 3: The Definition of GraphDB

(1) Graph Type for GraphDB~Which Graph is Better for GraphDB ?~

(2) Graph Traversals~ Graph Query ≡ Graph Traversal ~

(3) Index Free Adjacency~The Key of Definition of GraphDB~

(4) Other Topics

Agenda

Page 4: The Definition of GraphDB

(1) Graph Class for GraphDB~Which Graph is Better for GraphDB ?~

Page 5: The Definition of GraphDB

・Graph is an ordered pair G = (V, E)

・Set V of Nodes

・Set E of Edges

- 2 Element Subsets of V

- Representing “Relationship” Between Nodes

- Directed or Undirected

Definition of Graph

Page 6: The Definition of GraphDB

[Undirected Graph]

・Edges have no orientation

・Not ordered pairs, but sets {u, v} i.e. Edge (a, b) ≡ (b, a)

・All nodes have the same object type

・All edges have the same relationship

Def. Undirected Graph

G = (V, E)

Page 7: The Definition of GraphDB

[Directed Graph (Digraph)]

・Ordered pair D = (V, A)

・A: Set of ordered pairs of nodes, called “arrows”

・All nodes have the same object type

・All edges have the same relationship

Def. Directed Graph

D = (V, A)

symmetric

Page 8: The Definition of GraphDB

Example: (Un)Directed Graph

follow

follow

follow

follow

friend

friend

friend

・relationship of all edges: “friend”

・facebook friend is symmetric

・node object type: “user”

・relationship of all edges: “follow”

・twitter follow action is asymnetric

・node object type: “user”

[Facebook] [Twitter]

Page 9: The Definition of GraphDB

[Mixed Graph]

・Edges may be directed and some may be undirected

[Multigraph]

・Including (direct/indirect) loop edges and multiple edges

Def. Mixed Graph, Multigraph

G = (V, E,A)

D = (V, A)

loop

multiple

Page 10: The Definition of GraphDB

・These types of graphs can have common representation

・undirected edge --> 2 directed edges

・symmetric edge --> 2 asymmetric directed edges

・allows loop and multiple edge

Common Representation

Page 11: The Definition of GraphDB

symmetric undirected

multiple

loop

・No undirected edge・No symmetric edge

Common Representation

Page 12: The Definition of GraphDB

Def. SIngle-Relational Graph[Single-Relational Structures]

・Multigraph

・All edges must be the same relationship

・All nodes must be the same object type

・All graphs already introduced are SR-Graphs

Is this class sufficient for graph database ?

Page 13: The Definition of GraphDB

[Multi-Relational Structures]

・More flexible than single-relational structures

・All edges are directed and asymmetric

・Each edge can have a different relationship

・Each node can have a different type object

Def. Multi-Relational Graph

Page 14: The Definition of GraphDB

Example: Multi-Relational Graph

・4 types of relationships: “Reply”, “DM”, “RT”, “Block”

・Every node still have the same object type

[Twitter]

Reply

Reply DM

DM

ReplyBlock

Reply

RTRT

Page 15: The Definition of GraphDB

Example: Multi-Relational Graph

・Many types of relationship・Connection: user --> item・Connection: user <--> user

want!

follow

like!

exhibit

want!

invite

exhibitwant!

want!

bought!exhibit

follow

follow

want!

message

[Livlis]http://www.livlis.com/

Page 16: The Definition of GraphDB

Def. Property Graph

[Property Graph]

・Multi-Relational Graph

・Each node and edge has some properties

・Each property is represented by “key-value” and scheme-free

follow

since 2011/01/23

id id_Bfollow 500

follower 1000since 2011/06/01

id id_Afollow 100

follower 200since 2011/01/01

Page 18: The Definition of GraphDB

name Bfollow 10follower 20sex man

Example: Property Graph

want!

follow

like!

exhibit

want!

invite

exhibitwant!

want!

bought!exhibit

follow

follow

want!

message

[Livlis]name Afollow 100follower 200sex man

favorite 50

since 01/01/01

price $50

since 01/01/01price $50access 500wated 10liked 30

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

Page 19: The Definition of GraphDB

Def. Hyper Graph[Hyper Graph]

・Set V of Nodes

・Set E of non-empty subsets of V

・i.e. Edge can point to more than two nodes

・Every node or edge carry an arbitrary value as payload

・Property Graph ⊂ Hyper GraphH = (V, E)

Sones: manage edge types with GraphDB 2.1

Page 20: The Definition of GraphDB

・Property Graph have flexible representation

・Key features:- All edges are directed and asymmetry

- Each edge can have a different relationship

- Each node can have a different type object

- All elements have property with key-value style

・Many GraphDBs support for Property Graph Models ※ Some GraphDBs support for Hyper Graph Model

Summary

Page 21: The Definition of GraphDB

(2) Graph Traversals~ Graph Query ≡ Graph Traversal ~

Page 22: The Definition of GraphDB

Graph Query ≡ Graph Traversal 

・Not an “global” search like other RDBMS or NoSQL

・But traverse over the graph from “root node”

・”Locality” is very important

Graph Traversals

Property Graph Algorithms

Page 23: The Definition of GraphDB

・To traverse a graph is to process every node in the graph exactly once

・The two most common traversal patterns are breadth-first traversal and depth-first traversal

・For each step, the traverser moves to it's adjacent vertices

・Repeat each step until specific times or full some condition

Graph Traversals

Page 24: The Definition of GraphDB

The Graph Traversal Pattern 9

1

name=Alberto Pepe

2

name=...

3

name=...

4

name=...

friend

friend

friende

out

e

friend

lab+

v

in

name

Fig. 3. A single path along along the f traversal.

those edges with the label friend, then traverse to the incoming (i.e. head)vertices on those friend-labeled edges. Finally, of those vertices, return theirname property.21 A single legal path according to this function is diagrammedin Figure 3. Though not diagrammed for the sake of clarity, the traversal wouldalso go from vertex 1 to the name of vertex 2 and vertex 3. The function f

is a “higher-order” adjacency defined as the composition of explicit adjacen-cies and serves as a join of Alberto and his friend’s names.22 The remainderof this section demonstrates graph traversals in real-world problems-solvingsituations.

3.1 Traversing for Recommendation

Recommendation systems are designed to help people deal with the problemof information overload by filtering information in the system that doesn’tpertain to the person [14]. In a positive sense, recommendation systems focusa person’s attention on those resources that are likely to be most relevantto their particular situation. There is a standard dichotomy in recommenda-tion research—that of content- vs. collaborative filtering-based recommenda-tion. The prior deals with recommending resources that share characteristics(i.e. content) with a set of resources. The latter is concerned with determiningthe similarity of resources based upon the similarity of the taste of the peo-ple modeled within the system [6]. These two seemingly di↵erent techniquesto recommendation are conveniently solved using a graph database and twosimple traversal techniques [10, 5]. Figure 4 presents a toy graph data set,where there exist a set of people, resources, and features related to each otherby likes- and feature-labeled edges. This simple data set is used for theremaining examples of this subsection.

21 Note that the order of a composition is evaluated from right to left.22 This is known as a virtual edge in the graph system called DEX [9].

・Single step traversal: from element i to element j, where i, j ∈ (V ∪ E).

・Can define graph traversals of arbitrary length from single step traversal

・Querying is performed through traversals, which can perform millions of "joins" per second

Graph Traversals

The Graph Traversal Pattern

Page 25: The Definition of GraphDB

Graph Traversals

Basic Graph Traversals

Page 26: The Definition of GraphDB

・GraphDB is efficient with respects to local data analysis (Recommendation, Social Analytics, Shortest Path). They all focus on “a user”

・Locality is defined by direct referent structures

・Frame all solutions to problems as a traversal over local regions of the graph

Summary

Page 27: The Definition of GraphDB

(3) Index Free Adjacency~The Key of Definition of GraphDB~

Page 28: The Definition of GraphDB

※ GraphDB is a not only database that can model a graph structures (RDB, Document, etc...)

[definition]

・A graph database is any storage system that provides “index-free adjacency”

The Definition of GraphDB

The Graph Traversal Programming Pattern

Page 29: The Definition of GraphDB

[Important feature]

・Mini Index: Every element (node or edge) has a direct pointer to its adjacent element

・No Index lookup: we can determine which vertex is adjacent to which other vertex without looking up an index-tree

The Definition of GraphDB

Page 30: The Definition of GraphDB

Relational Data Model

column1 column2 column312345678

[Index-tree]

[Graph data in table]

Graph Databases and Endogenous Indices

createdcreated

follows

follows

created

citescites

created

cites

createdfollows

follows

follows

name=twarkoage=30

name=ahzf

name=graph_blogviews=1000

name=tenderlovegender=male

date=2007/10

name=neo4jviews=56781

page_rank=0.023

name=peterneubauer

name property index

views property index gender property index

Graph Databases Make Use of Indices

A B C

D E }}

The Graph

Index of Vertices(by id)

• There is more to the graph than the explicit graph structure.

• Indices index the vertices, by their properties (e.g. ids).

Indexing of Verticies

Graph Data

The Graph Traversal Programming Pattern

Page 31: The Definition of GraphDB

A

E

C

B

D

A B C

D E

B, C E D, E

2. Looking up the index-tree

log_2(n) time cost 4. Moving to

either B or C

1. Want to determineneighbors of A

3. Getting the adjacency list (B,C)[Index-tree] [Graph Data]

Relational Data Model

Page 32: The Definition of GraphDB

[Index-tree] [Graph Data]

Lookup cost become larger Graph growth. O(log2n)

Looking cost become very high

Relational Data ModelTakes many time for traversing

Relational Data Model

Page 33: The Definition of GraphDB

・Insert time: as the graph grows in size, the cost of a insert time become high

・lookup time: as the graph grows in size, the cost of a lookup time growth in proportional to n, O(log2n)

・memory size: as the graph grows in size, the memory size become high

Cost of Looking Up Index-tree

Page 34: The Definition of GraphDB

Graph DB Model

[Graph Data]

[Mini-Index] direct references to its adjacent

verticesB

C

D

E

F

GA B, C

D,E

F,F

G

E,F,G

G

[Constant time]: It is dependent upon the number of connected

edges

Page 35: The Definition of GraphDB

Mini-Index: Graph DB Model

[Graph Data]

The cost of a local step remains the same

Page 36: The Definition of GraphDB

・Making external indexing system to index the properties of its vertices and edges

Indexing their properties

Graph Databases and Endogenous Indices

createdcreated

follows

follows

created

citescites

created

cites

createdfollows

follows

follows

name=twarkoage=30

name=ahzf

name=graph_blogviews=1000

name=tenderlovegender=male

date=2007/10

name=neo4jviews=56781

page_rank=0.023

name=peterneubauer

name property index

views property index gender property index

The Graph Traversal Programming Pattern

Page 37: The Definition of GraphDB

・GraphDB provides “index-free adjacency”

・No looking up index-tree, each element has direct pointers

・They have a external index system for their properties (both nodes and relations)

・A very large graph can storage only single server because a traversal cost is independence of growth of graph

Summary

Page 38: The Definition of GraphDB

(4) Other Benefits of GraphDB

Page 39: The Definition of GraphDB

A Graph Database Transforms a RDBMS

← RDBMS↓ GraphDB as RDBMS

Comparing Database Models

Page 40: The Definition of GraphDB

A Graph Database Transforms a Key-Value Store

← RDBMS

↓ GraphDB as Key-Value

Comparing Database Models

Page 41: The Definition of GraphDB

A Graph Database transforms a Document DB

↑ Document DB↓ GraphDB as RDBMS

Comparing Database Models

Page 44: The Definition of GraphDB

Square Pegs and Round Holes in the NOSQL World

Example of e-commerce site

Recommendation!!

Page 45: The Definition of GraphDB

(1) Graph Type for GraphDB~Which Graph is Better for GraphDB ?~

(2) Graph Traversals~ Graph Query ≡ Graph Traversal ~

(3) Index Free Adjacency~The Key of Definition of GraphDB~

Did you Understand?