The Definition of GraphDB

Post on 15-Jan-2015

31.384 views 2 download

Tags:

description

 

Transcript of The Definition of GraphDB

The Definition ofGraphDB

@doryokujin

GraphDB Meet-Up Japan #1

・Takahiro Inoue(age 26)

・twitter: doryokujin

・Majored in Math (Statistics & Graph Algorithm)

・Data Scientist

・Leader of MongoDB JP

・Interest: DataProcessing, GraphDB

About Me

(1) Graph Type for GraphDB~Which Graph is Better for GraphDB ?~

(2) Graph Traversals~ Graph Query ≡ Graph Traversal ~

(3) Index Free Adjacency~The Key of Definition of GraphDB~

(4) Other Topics

Agenda

(1) Graph Class for GraphDB~Which Graph is Better for GraphDB ?~

・Graph is an ordered pair G = (V, E)

・Set V of Nodes

・Set E of Edges

- 2 Element Subsets of V

- Representing “Relationship” Between Nodes

- Directed or Undirected

Definition of Graph

[Undirected Graph]

・Edges have no orientation

・Not ordered pairs, but sets {u, v} i.e. Edge (a, b) ≡ (b, a)

・All nodes have the same object type

・All edges have the same relationship

Def. Undirected Graph

G = (V, E)

[Directed Graph (Digraph)]

・Ordered pair D = (V, A)

・A: Set of ordered pairs of nodes, called “arrows”

・All nodes have the same object type

・All edges have the same relationship

Def. Directed Graph

D = (V, A)

symmetric

Example: (Un)Directed Graph

follow

follow

follow

follow

friend

friend

friend

・relationship of all edges: “friend”

・facebook friend is symmetric

・node object type: “user”

・relationship of all edges: “follow”

・twitter follow action is asymnetric

・node object type: “user”

[Facebook] [Twitter]

[Mixed Graph]

・Edges may be directed and some may be undirected

[Multigraph]

・Including (direct/indirect) loop edges and multiple edges

Def. Mixed Graph, Multigraph

G = (V, E,A)

D = (V, A)

loop

multiple

・These types of graphs can have common representation

・undirected edge --> 2 directed edges

・symmetric edge --> 2 asymmetric directed edges

・allows loop and multiple edge

Common Representation

symmetric undirected

multiple

loop

・No undirected edge・No symmetric edge

Common Representation

Def. SIngle-Relational Graph[Single-Relational Structures]

・Multigraph

・All edges must be the same relationship

・All nodes must be the same object type

・All graphs already introduced are SR-Graphs

Is this class sufficient for graph database ?

[Multi-Relational Structures]

・More flexible than single-relational structures

・All edges are directed and asymmetric

・Each edge can have a different relationship

・Each node can have a different type object

Def. Multi-Relational Graph

Example: Multi-Relational Graph

・4 types of relationships: “Reply”, “DM”, “RT”, “Block”

・Every node still have the same object type

[Twitter]

Reply

Reply DM

DM

ReplyBlock

Reply

RTRT

Example: Multi-Relational Graph

・Many types of relationship・Connection: user --> item・Connection: user <--> user

want!

follow

like!

exhibit

want!

invite

exhibitwant!

want!

bought!exhibit

follow

follow

want!

message

[Livlis]http://www.livlis.com/

Def. Property Graph

[Property Graph]

・Multi-Relational Graph

・Each node and edge has some properties

・Each property is represented by “key-value” and scheme-free

follow

since 2011/01/23

id id_Bfollow 500

follower 1000since 2011/06/01

id id_Afollow 100

follower 200since 2011/01/01

name Bfollow 10follower 20sex man

Example: Property Graph

want!

follow

like!

exhibit

want!

invite

exhibitwant!

want!

bought!exhibit

follow

follow

want!

message

[Livlis]name Afollow 100follower 200sex man

favorite 50

since 01/01/01

price $50

since 01/01/01price $50access 500wated 10liked 30

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

... ...

Def. Hyper Graph[Hyper Graph]

・Set V of Nodes

・Set E of non-empty subsets of V

・i.e. Edge can point to more than two nodes

・Every node or edge carry an arbitrary value as payload

・Property Graph ⊂ Hyper GraphH = (V, E)

Sones: manage edge types with GraphDB 2.1

・Property Graph have flexible representation

・Key features:- All edges are directed and asymmetry

- Each edge can have a different relationship

- Each node can have a different type object

- All elements have property with key-value style

・Many GraphDBs support for Property Graph Models ※ Some GraphDBs support for Hyper Graph Model

Summary

(2) Graph Traversals~ Graph Query ≡ Graph Traversal ~

Graph Query ≡ Graph Traversal 

・Not an “global” search like other RDBMS or NoSQL

・But traverse over the graph from “root node”

・”Locality” is very important

Graph Traversals

Property Graph Algorithms

・To traverse a graph is to process every node in the graph exactly once

・The two most common traversal patterns are breadth-first traversal and depth-first traversal

・For each step, the traverser moves to it's adjacent vertices

・Repeat each step until specific times or full some condition

Graph Traversals

The Graph Traversal Pattern 9

1

name=Alberto Pepe

2

name=...

3

name=...

4

name=...

friend

friend

friende

out

e

friend

lab+

v

in

name

Fig. 3. A single path along along the f traversal.

those edges with the label friend, then traverse to the incoming (i.e. head)vertices on those friend-labeled edges. Finally, of those vertices, return theirname property.21 A single legal path according to this function is diagrammedin Figure 3. Though not diagrammed for the sake of clarity, the traversal wouldalso go from vertex 1 to the name of vertex 2 and vertex 3. The function f

is a “higher-order” adjacency defined as the composition of explicit adjacen-cies and serves as a join of Alberto and his friend’s names.22 The remainderof this section demonstrates graph traversals in real-world problems-solvingsituations.

3.1 Traversing for Recommendation

Recommendation systems are designed to help people deal with the problemof information overload by filtering information in the system that doesn’tpertain to the person [14]. In a positive sense, recommendation systems focusa person’s attention on those resources that are likely to be most relevantto their particular situation. There is a standard dichotomy in recommenda-tion research—that of content- vs. collaborative filtering-based recommenda-tion. The prior deals with recommending resources that share characteristics(i.e. content) with a set of resources. The latter is concerned with determiningthe similarity of resources based upon the similarity of the taste of the peo-ple modeled within the system [6]. These two seemingly di↵erent techniquesto recommendation are conveniently solved using a graph database and twosimple traversal techniques [10, 5]. Figure 4 presents a toy graph data set,where there exist a set of people, resources, and features related to each otherby likes- and feature-labeled edges. This simple data set is used for theremaining examples of this subsection.

21 Note that the order of a composition is evaluated from right to left.22 This is known as a virtual edge in the graph system called DEX [9].

・Single step traversal: from element i to element j, where i, j ∈ (V ∪ E).

・Can define graph traversals of arbitrary length from single step traversal

・Querying is performed through traversals, which can perform millions of "joins" per second

Graph Traversals

The Graph Traversal Pattern

Graph Traversals

Basic Graph Traversals

・GraphDB is efficient with respects to local data analysis (Recommendation, Social Analytics, Shortest Path). They all focus on “a user”

・Locality is defined by direct referent structures

・Frame all solutions to problems as a traversal over local regions of the graph

Summary

(3) Index Free Adjacency~The Key of Definition of GraphDB~

※ GraphDB is a not only database that can model a graph structures (RDB, Document, etc...)

[definition]

・A graph database is any storage system that provides “index-free adjacency”

The Definition of GraphDB

The Graph Traversal Programming Pattern

[Important feature]

・Mini Index: Every element (node or edge) has a direct pointer to its adjacent element

・No Index lookup: we can determine which vertex is adjacent to which other vertex without looking up an index-tree

The Definition of GraphDB

Relational Data Model

column1 column2 column312345678

[Index-tree]

[Graph data in table]

Graph Databases and Endogenous Indices

createdcreated

follows

follows

created

citescites

created

cites

createdfollows

follows

follows

name=twarkoage=30

name=ahzf

name=graph_blogviews=1000

name=tenderlovegender=male

date=2007/10

name=neo4jviews=56781

page_rank=0.023

name=peterneubauer

name property index

views property index gender property index

Graph Databases Make Use of Indices

A B C

D E }}

The Graph

Index of Vertices(by id)

• There is more to the graph than the explicit graph structure.

• Indices index the vertices, by their properties (e.g. ids).

Indexing of Verticies

Graph Data

The Graph Traversal Programming Pattern

A

E

C

B

D

A B C

D E

B, C E D, E

2. Looking up the index-tree

log_2(n) time cost 4. Moving to

either B or C

1. Want to determineneighbors of A

3. Getting the adjacency list (B,C)[Index-tree] [Graph Data]

Relational Data Model

[Index-tree] [Graph Data]

Lookup cost become larger Graph growth. O(log2n)

Looking cost become very high

Relational Data ModelTakes many time for traversing

Relational Data Model

・Insert time: as the graph grows in size, the cost of a insert time become high

・lookup time: as the graph grows in size, the cost of a lookup time growth in proportional to n, O(log2n)

・memory size: as the graph grows in size, the memory size become high

Cost of Looking Up Index-tree

Graph DB Model

[Graph Data]

[Mini-Index] direct references to its adjacent

verticesB

C

D

E

F

GA B, C

D,E

F,F

G

E,F,G

G

[Constant time]: It is dependent upon the number of connected

edges

Mini-Index: Graph DB Model

[Graph Data]

The cost of a local step remains the same

・Making external indexing system to index the properties of its vertices and edges

Indexing their properties

Graph Databases and Endogenous Indices

createdcreated

follows

follows

created

citescites

created

cites

createdfollows

follows

follows

name=twarkoage=30

name=ahzf

name=graph_blogviews=1000

name=tenderlovegender=male

date=2007/10

name=neo4jviews=56781

page_rank=0.023

name=peterneubauer

name property index

views property index gender property index

The Graph Traversal Programming Pattern

・GraphDB provides “index-free adjacency”

・No looking up index-tree, each element has direct pointers

・They have a external index system for their properties (both nodes and relations)

・A very large graph can storage only single server because a traversal cost is independence of growth of graph

Summary

(4) Other Benefits of GraphDB

A Graph Database Transforms a RDBMS

← RDBMS↓ GraphDB as RDBMS

Comparing Database Models

A Graph Database Transforms a Key-Value Store

← RDBMS

↓ GraphDB as Key-Value

Comparing Database Models

A Graph Database transforms a Document DB

↑ Document DB↓ GraphDB as RDBMS

Comparing Database Models

Square Pegs and Round Holes in the NOSQL World

Example of e-commerce site

Recommendation!!

(1) Graph Type for GraphDB~Which Graph is Better for GraphDB ?~

(2) Graph Traversals~ Graph Query ≡ Graph Traversal ~

(3) Index Free Adjacency~The Key of Definition of GraphDB~

Did you Understand?