Memoirs of a Graph Addict: Despair to Redemption

129
Memoirs of a Graph Addict: Despair to Redemption Marko A. Rodriguez Graph Systems Architect http://markorodriguez.com http://twitter.com/twarko Winter Whirlwind Tour – Chicago to Malm¨ o – January 10-14, 2011 January 8, 2011

Transcript of Memoirs of a Graph Addict: Despair to Redemption

Page 1: Memoirs of a Graph Addict: Despair to Redemption

Memoirs of a Graph Addict:

Despair to Redemption

Marko A. RodriguezGraph Systems Architect

http://markorodriguez.com

http://twitter.com/twarko

Winter Whirlwind Tour – Chicago to Malmo – January 10-14, 2011

January 8, 2011

Page 2: Memoirs of a Graph Addict: Despair to Redemption

Abstract

A graph database provides a means of linking together objects using directreferences. In other words, in order to determine if one object is adjacentto another, no index lookup is required. In contrast to relational databases,in a graph database, there is no notion of a join operation as the graph isalready an explicitly joined structure. Given a graph, problems are solvedusing graph traversals–that is, directed walks over the objects and relationsthat compose the graph. This lecture has three primary points ofdiscussion. The first is a description of graph database technology. Thesecond, a memoir of the speaker’s applied and theoretical work withgraphs. The third and final point, a review of an open source graphprocessing stack currently being developed by AT&T Interactive and itscollaborators.

Page 3: Memoirs of a Graph Addict: Despair to Redemption
Page 4: Memoirs of a Graph Addict: Despair to Redemption
Page 5: Memoirs of a Graph Addict: Despair to Redemption
Page 6: Memoirs of a Graph Addict: Despair to Redemption

For 10 years now, I’ve dealt with a painful graph addiction...Let me share my story with you.

Page 7: Memoirs of a Graph Addict: Despair to Redemption

Outline

• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite

Page 8: Memoirs of a Graph Addict: Despair to Redemption

Outline

• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite

Page 9: Memoirs of a Graph Addict: Despair to Redemption

Graph Data Structure Pieces: Part 1

id vertex (thing, object, dot)

edge (relation, join, line)

element}

Page 10: Memoirs of a Graph Addict: Despair to Redemption

Single-Relational Graph

marko peter

tinkerpop

neotech

neo4j

blueprintsgremlin

In single-relational graphs, things are related. Unfortunately, not a very useful structure

for most domain modeling situations. Relatedness is too generic—all edges have the

same meaning.

Page 11: Memoirs of a Graph Addict: Despair to Redemption

Graph Data Structure Pieces: Part 2

id

label

vertex (thing, object, dot)

edge (relation, join, line)

element}

Page 12: Memoirs of a Graph Addict: Despair to Redemption

Multi-Relational Graph

marko peter

tinkerpop

neotech

knows

member member

member

created

neo4j

blueprints

createdcreated

gremlin

knows

imports

imports

By adding labels to the edges, its possible to denote the type of relation that exists

between any two vertices. Now its possible to denote different types of things and the

different ways in which they relate to one another.

Page 13: Memoirs of a Graph Addict: Despair to Redemption

Graph Data Structure Pieces: Part 3

id

label

key1=value1key2=value2

vertex (thing, object, dot)

edge (relation, join, line)

property (key/value, attribute)key=value

property map

element}

Page 14: Memoirs of a Graph Addict: Despair to Redemption

Property Graph

marko peter

tinkerpop

neotech

knows

member member

member

created

neo4j

blueprints

createdcreated

gremlin

knows

imports

imports

lang=javause=traverse

lang=javause=api

date=2009

lang=javause=graphdb

date=2009

Allow elements to have key/value properties. In particular, very useful for further

specifying the meaning of an edge. “When did TinkerPop create Gremlin?”

Page 15: Memoirs of a Graph Addict: Despair to Redemption

Numerous Graph Types

http://ex.com/123

a

0.2

knows

mul

ti

weighted

directed

vertex-labeled

name=emiltype=person

vertex-attributed

created=2-01-09modified=2-11-09edge-attributed

hyper

pseudo

resource description framework

half-

edge

hired

simple

edge-labeled

sem

antic undirected

Rodriguez, M.A., Neubauer, P., “Constructions from Dots and Lines,” Bulletin of the American Society for Information Science

and Technology, 36(6), pp. 35-41, 2010. [http://arxiv.org/abs/1006.2361]

Page 16: Memoirs of a Graph Addict: Despair to Redemption

Property Graph as a Rich Structure

property graph

weighted graph

semantic graph

multi-graph

undirected graph

directed graph

simple graph

add weight attribute

remove attributes

remove edge labels

remove loops, directionality, and multiple edges

no op

no op

no op

no op

remove directionality

remove attributes

labeled graph

remove edge labels

no op

rdf graph

make labels URIs

A fun related thought: Rodriguez, M.A., “Mapping Semantic Networks to Undirected Networks,” International Journal of

Applied Mathematics and Computer Sciences, 4(1), pp. 39–42, 2009. [http://arxiv.org/abs/0804.0277]

Page 17: Memoirs of a Graph Addict: Despair to Redemption

Graph Algorithms in Single-Relational Graphs

• Most graph algorithms are designed for single-relational graphs.1

? Geodesic: shortest path, eccentricity, diameter, closeness centrality,betweenness centrality, etc.

? Eigenvector: spreading activation, pagerank, eigenvector centrality,etc.

? Assortative: scalar, assortative, etc.

1Excellent book reviewing numerous graph algorithms: Brandes U., Erlebach, T., “Network Analysis:Methodological Foundations,” Springer, 2005.

Page 18: Memoirs of a Graph Addict: Despair to Redemption

Graph Algorithms in Multi-Relational+ Graphs• Most real-world software systems require multi-relational+ graphs. E.g.:

Who are the most central coauthors when all I know is wrote?

wrotewrotewrotewrote wrote wrote

coauthorcoauthor

• A key concept when evaluating graph algorithms over multi-relational+graphs is implicit adjacency/path descriptions/virtual edges/etc.2

2Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network AnalysisAlgorithms,” Journal of Informetrics, 4(1), pp. 29–41, 2009. [http://arxiv.org/abs/0806.2274]

Page 19: Memoirs of a Graph Addict: Despair to Redemption

Outline

• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite

Page 20: Memoirs of a Graph Addict: Despair to Redemption

The Simplicity of a Graph

• A graph is a simple data structure.

• A graph states that something is related to something else (the foundationof any other data structure).3

• It is possible to model a graph in various types of databases.4

? Relational database: MySQL, Oracle, PostgreSQL

? JSON document database: MongoDB, CouchDB

? XML document database: MarkLogic, eXist-db

? etc.

3A graph can be used to represent other data structures. This point becomes convenient when lookingbeyond using graphs for typical, real-world domain models (e.g. friends, favorites, etc.), and seeing theirapplicability in other areas such as modeling code (e.g. http://arxiv.org/abs/0802.3492), indices, etc.

4For the sake of diagram clarity, the examples to follow are with respect to a single-relational, directedgraph. Note that it is possible to model multi-relational graphs in these types of database as well.

Page 21: Memoirs of a Graph Addict: Despair to Redemption

Representing a Graph in a Relational Database

outV | inV

------------

A | B

A | C

C | D

D | A

A

CB

D

Page 22: Memoirs of a Graph Addict: Despair to Redemption

Representing a Graph in a JSON Database

{

A : {

outE : [B, C]

}

B : {

outE : []

}

C : {

outE : [D]

}

D : {

outE : [A]

}

}

A

CB

D

Page 23: Memoirs of a Graph Addict: Despair to Redemption

Representing a Graph in an XML Database

<graphml>

<graph>

<node id=A />

<node id=B />

<node id=C />

<node id=D />

<edge source=A target=B />

<edge source=A target=C />

<edge source=C target=D />

<edge source=D target=A />

</graph>

</graphml>

A

CB

D

Page 24: Memoirs of a Graph Addict: Despair to Redemption

Defining a Graph Database

“If any database can represent a graph, then what

is a graph database?”

Page 25: Memoirs of a Graph Addict: Despair to Redemption

Defining a Graph Database

A graph database is any storage system thatprovides index-free adjacency.

Page 26: Memoirs of a Graph Addict: Despair to Redemption

Defining a Graph Database by Example

D

E

C

A

B

Toy Graph Gremlin(stuntman)

Page 27: Memoirs of a Graph Addict: Despair to Redemption

Graph Databases and Index-Free Adjacency

D

E

C

A

B

• Our gremlin is at vertex A.

• In a graph database, vertex A has direct references to its adjacent vertices.

• Constant time cost to move from A to B and C. It is dependent upon the number

of edges emanating from vertex A (local).

Page 28: Memoirs of a Graph Addict: Despair to Redemption

Graph Databases and Index-Free Adjacency

D

E

C

A

B

The Graph (explicit)

Page 29: Memoirs of a Graph Addict: Despair to Redemption

Graph Databases and Index-Free Adjacency

D

E

C

A

B

The Graph (explicit)

Page 30: Memoirs of a Graph Addict: Despair to Redemption

Non-Graph Databases and Index-Based Adjacency

D

E

C

A

B

A B C

D EB,C E D,E

• Our gremlin is at vertex A.

Page 31: Memoirs of a Graph Addict: Despair to Redemption

Non-Graph Databases and Index-Based Adjacency

D

E

C

A

B

A B C

D EB,C E D,E

• In a non-graph database, the gremlin needs to look at an index to determine whatis adjacent to A.

• log(n) time cost to move to B and C. It is dependent upon the total number of

vertices and edges in the database (global).

Page 32: Memoirs of a Graph Addict: Despair to Redemption

Non-Graph Databases and Index-Based Adjacency

D

E

C

A

B

A B C

D EB,C E D,E

The Index (explicit) The Graph (implicit)

Page 33: Memoirs of a Graph Addict: Despair to Redemption

Non-Graph Databases and Index-Based Adjacency

D

E

C

A

B

A B C

D EB,C E D,E

The Index (explicit) The Graph (implicit)

Page 34: Memoirs of a Graph Addict: Despair to Redemption

Index-Free Adjacency

• While any database can implicitly represent a graph, only agraph database makes the graph structure explicit.5

• In a graph database, each vertex serves as a “mini index”of its adjacent elements.6

• Thus, as the graph grows in size, the cost of a local stepremains the same.7

5Please see http://markorodriguez.com/Blarko/Entries/2010/3/29_MySQL_vs._Neo4j_on_a_

Large-Scale_Graph_Traversal.html for some performance characteristics of graph traversals in arelational database (MySQL) and a graph database (Neo4j).

6Each vertex can be intepreted as a “parent node” in an index with its children being its adjacentelements. In this sense, traversing a graph is analogous in many ways to traversing an index—albeit thegraph is not an acyclic connected graph (tree). (a vision espoused by Craig Taverner)

7A graph, in many ways, is like a distributed index.

Page 35: Memoirs of a Graph Addict: Despair to Redemption

Graph Query = Graph Traversal

• Graph databases are optimized for graph-theoretic operations

(e.g. graph traversals).

• Graph databases are not optimized for set-theoretic

operations (e.g. union, intersection, theta-join).

• The graph traversal pattern:8

? Given some root set of elements, traverse in X fashionto yield some side-effect and/or destination.

8Rodriguez, M.A., Neubauer, P., “The Graph Traversal Pattern,” Graph Data Management: Techniquesand Applications, eds. S. Sakr, E. Pardede, IGI Global, 2011. http://arxiv.org/abs/1004.1001

Page 36: Memoirs of a Graph Addict: Despair to Redemption

Outline

• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite

Page 37: Memoirs of a Graph Addict: Despair to Redemption

Adventures in Graphlandia

My graph disease first started in 2001 and it’s only progressed since...

• Collective decision making: graph-based voting.

• Eudaemonic engine: graph-based recommendation.

• Universal computer: graph-based computing.

Page 38: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: Fall of the Modern World

The year is 2014.

Page 39: Memoirs of a Graph Addict: Despair to Redemption

Oil production has dropped significantly. Any reserves that are left are tooexpensive to purchase. Nations can not transport food.9

Regions with poor agriculture yield famine.

9Peak oil available at http://en.wikipedia.org/wiki/Peak_oil.

Page 40: Memoirs of a Graph Addict: Despair to Redemption

People are in shock, fear, and panic over the fall ofthe modern world.

The world sees a 75% drop in human population.

Page 41: Memoirs of a Graph Addict: Despair to Redemption

The technology and knowledge of the modern worldstill exists.

The social infrastructure doesn’t....A few rise tocreate a new world order.10

10Watkins, J.H., M.A. Rodriguez, “A Survey of Web-Based Collective Decision Making Systems,” Studiesin Computational Intelligence: Evolution of the Web in Artificial Intelligence Environments, eds. R. Nayak,N. Ichalkaranje, and L.C. Jain, pp. 245-279, 2008. [http://escholarship.org/uc/item/04h3h1cr]

Page 42: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: Rise of the Machines

Four strong, brave men begin thejourney to stability. Decisionsneed to be made regarding howto determine and execute socialgoals. The distributed collective ofTinkerPop is created.

• Marko Rodriguez (former USA)

• Peter Neubauer (former Sweden)

• Josh Shinavier (former China)

• Pavel Yaskevich (former Belarus)

marko

josh

pavel

peter

Page 43: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: Rise of the Machines

Direct DemocracyDynamically Distribute

Democracy

marko

josh pavel

peter

Two examples will be presented for the same decision making scenario. One using direct

democracy as the aggregation algorithm and one using dynamically distributed

democracy as the aggregation algorithm.11

11Rodriguez, M.A., Watkins, J.H., “Revisiting the Age of Enlightenment from a Collective DecisionMaking Systems Perspective,” First Monday, 14(8), 2009. [http://arxiv.org/abs/0901.3929]

Page 44: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: Direct Democracy

• “What percentage of our cropyield should we store asreserves?”

• The outcome is represented as areal value in [0, 1].

• Each individual has their opinionof the situation.

? Marko (80% should be stored.)

? Peter (50% should be stored.)

? Josh (80% should be stored.)

? Pavel (90% should be stored.)

marko0.8

josh0.8

pavel0.9

peter0.5

Page 45: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: Direct Democracy

• In a direct democracy, every onevoices their opinion.

• The average of all voiced opinionsis the final decision (even in binarydecisions).

• For our society of 4, a pure directdemocracy would yield(0.8+ 0.5+ 0.8+ 0.9)/4 = 0.75.

marko0.8

josh0.8

pavel0.9

peter0.5

Page 46: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: Direct Democracy

• If an individual abstains fromparticipation, then their opinionis not considered.

• Assume only Peter and Pavel arethere to participate. Marko andJosh are out hunting.

• For our society of 4 (with 2voters), a pure direct democracywould yield(0.5 + 0.9)/2 = 0.7.|0.75− 0.7| = 0.05 error.

marko0.8

josh0.8

pavel0.9

peter0.5

Page 47: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: Representative Democracy

• Thomas Paine stated that when populations are small “some convenienttree will afford them a State house”, but as the population increases itbecomes a necessity for representatives to “act in the same manner asthe whole body would act were they present.”12 13

12Paine, T., “Common Sense,” 1776.13The role of the representative as an expert vs. a model is argued at length in Pitkin, H.F., “The

Concept of Representation,” University of California Press, 1972.

Page 48: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: DDD

• Dynamically distributed democracy (DDD) strikes a balance betweendirect and representative democracy.

• An individual is at least a representative of themselves.

• An individual can also yield the power of those that abstain fromparticipation.

• Dynamically distributing representative power is the purpose of thealgorithm.

Page 49: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: DDD

• Peter believes that Josh andMarko are good decision makers.

• When Peter abstains, Markoand Josh yield his social powerin equal parts (0.5).

• Like a friendship graph, but theedges denote “trust.”

? “I believe that X has identical values

to me and will behave as I do.”

? “I believe that X is more expert than

I and should make decisions.”

marko

josh

pavel

peter

0.5

0.5

Page 50: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: DDD

• Marko believes Josh is the key tohumanity.

• Josh prefers people closer to hiseastern home of former China.

• Pavel is of the former SovietUnion, and simply has no faithin anyone.

marko

josh

pavel

peter

0.5

0.5

1.0

0.75

0.25

Page 51: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: DDD

marko

josh

pavel

peter

0.5

0.5

1.0

0.75

0.25

This is the trust-based social graph. Individuals can add/removeoutgoing edges from their vertex as they please. When decisions arerequired, the current snapshot of the graph is used to compute thecollective decision.

Page 52: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: DDD

• In a dynamically distributeddemocracy, every can voice theiropinion.

• The weighted average of allvoiced opinions is the finaldecision.

• For our society of 4, a pure directdemocracy would yield(0.8+ 0.5+ 0.8+ 0.9)/4 = 0.75.

• When everyone participates,its a direct democracy.

marko

josh

pavel

peter

0.5

0.5

1.0

0.75

0.25

Page 53: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: DDD

• Assume Marko and Josh gohunting, again. By abstaining,they diffuse their vote powerover their outgoing edges.

• By participating, Peter andPavel aggregate vote powerthrough their incoming edges.

• This diffusion process continuesuntil all power has aggregated atparticipating individuals.

marko0.8

josh0.8

pavel0.9

peter0.5

0.5

0.5

1.0

0.75

0.25

1.0

1.0

1.0

1.0

Page 54: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: DDD

• Note that Marko fully trusts Joshdecision making abilities.

• However, given that Josh is notparticipating, Marko is implicitlystating that he trusts Josh’sdecision in choosing decisionmakers.

• Thus, Josh serves to routeMarko’s power.

marko0.8

josh0.8

pavel0.9

peter0.5

0.5

0.5

1.0

0.75

0.25

1.0

1.75

1.25

Page 55: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: DDD

• In the end, Peter and Pavelhave aggregated all the energyin the graph (albeit, to differentdegrees).

• Now a weighted direct democracyis used to calculate the collectivedecision.

• The collective vote is((1.5 ·0.5)+(2.5 ·0.9))/4 = 0.75.|0.75− 0.75| = 0.0 error.

marko0.8

josh0.8

pavel0.9

peter0.5

0.5

0.5

1.0

0.75

0.25

2.5

1.5

Page 56: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: DDD

percentage of active citizens

error

100 90 80 70 60 50 40 30 20 10 0

0.00

0.05

0.10

0.15

0.20

dynamically distributed democracydirect democracy

4

percentage of active citizens

pro

port

ion o

f corr

ect decis

ions

100 90 80 70 60 50 40 30 20 10 0

0.50

0.65

0.80

0.95

dynamically distributed democracy

direct democracy

(n)

Fig. 5. The relationship between k and evotek for direct democracy (gray

line) and dynamically distributed democracy (black line). The plot providesthe proportion of identical, correct decisions over a simulation that was runwith 1000 artificially generated networks composed of 100 citizens each.

As previously stated, let x ! [0, 1]n denote the politicaltendency of each citizen in this population, where xi is thetendency of citizen i and, for the purpose of simulation, isdetermined from a uniform distribution. Assume that everycitizen in a population of n citizens uses some social network-based system to create links to those individuals that theybelieve reflect their tendency the best. In practice, these linksmay point to a close friend, a relative, or some public figurewhose political tendencies resonate with the individual. Inother words, representatives are any citizens, not politicalcandidates that serve in public office. Let A ! [0, 1]n!n denotethe link matrix representing the network, where the weight ofan edge, for the purpose of simulation, is denoted

Ai,j =

!1 " |xi " xj | if link exists0 otherwise.

In words, if two linked citizens are identical in their politicaltendency, then the strength of the link is 1.0. If their tendenciesare completely opposing, then their trust (and the strength ofthe link) is 0.0. Note that a preferential attachment networkgrowth algorithm is used to generate a degree distribution thatis reflective of typical social networks “in the wild” (i.e. scale-free properties). Moreover, an assortativity parameter is usedto bias the connections in the network towards citizens withsimilar tendencies. The assumption here is that given a systemof this nature, it is more likely for citizens to create links tosimilar-minded individuals than to those whose opinions arequite different. The resultant link matrix A is then normalizedto be row stochastic in order to generate a probability distribu-tion over the weights of the outgoing edges of a citizen. Figure6 presents an example of an n = 100 artificially generatedtrust-based social network, where red denotes a tendency of0.0, purple a tendency of 0.5, and blue a tendency of 1.0.

Given this social network infrastructure, it is possible to bet-ter ensure that the collective tendency and vote is appropriatelyrepresented through a weighting of the active, participatingpopulation. Every citizen, active or not, is initially provide with

Fig. 6. A visualization of a network of trust links between citizens. Eachcitizen’s color denotes their “political tendency”, where full red is 0, full blueis 1, and purple is 0.5. The layout algorithm chosen is the Fruchterman-Reingold layout.

1n “vote power” and this is represented in the vector ! ! Rn

+,such that the total amount of vote power in the population is1. Let y ! Rn

+ denote the total amount of vote power that hasflowed to each citizen over the course of the algorithm. Finally,a ! {0, 1}n denotes whether citizen i is participating (ai = 1)in the current decision making process or not (ai = 0). Thevalues of a are biased by an unfair coin that has probability kof making the citizen an active participant and 1"k of makingthe citizen inactive. The iterative algorithm is presented below,where # denotes entry-wise multiplication and " $ 1.

! % 0while

"i"ni=1 yi < " do

y % y + (! # a)! % ! # (1 " a)! % A!

end

In words, active citizens serve as vote power “sinks” inthat once they receive vote power, from themselves or froma neighbor in the network, they do not pass it on. Inactivecitizens serve as vote power “sources” in that they propagatetheir vote power over the network links to their neighborsiteratively until all (or ") vote power has reached activecitizens. At this point, the tendency in the active populationis defined as #tend = x · y. Figure 4 plots the error incurredusing dynamically distributed democracy (black line), wherethe error is defined as

etendk = |dtend

100 " #tendk |.

Next, the collective vote #votek is determined by a weighted

majority as dictated by the vote power accumulated by activeparticipants. Figure 5 plots the proportion of votes that aredifferent from what a fully participating population would

• As participation wanes, dynamicallydistributed democracy is able tosimulate direct democracy.14

14Rodriguez, M.A., Steinbock, D.J., “A Social Network for Societal-Scale Decision-MakingSystems,” Proceedings of the Computational Social and Organizational Science Conference, 2004.[http://arxiv.org/abs/cs/0412047]

Page 57: Memoirs of a Graph Addict: Despair to Redemption

Collective Decision Making: Techno-Government

• In this model of decision making, there is no governmental body.

• Power is determined when a decision is needed.

• How are bills created? Wikilegislature?15

• What about different types of trust (e.g. “Marko trusts Josh inengineering decisions only.”) — Hint: Multi-relational+ graphs. Tagginglegislature and tagging trust.16

15Turoff, M., Roxanne-Hiltz, S., Bieber, M., Rana, A., “Collaborative Discourse Structures in ComputerMediated Group Communications”, Hawaii International Conference on Systems Science (HICSS), 1998.[http://web.njit.edu/~turoff/Papers/CDSCMC/CDSCMC.htm]

16Rodriguez, M.A., “Social Decision Making with Multi-Relational Networks and Grammar-BasedParticle Swarms,” Hawaii International Conference on Systems Science (HICSS), pp. 39–49, 2007.[http://arxiv.org/abs/cs/0609034]

Page 58: Memoirs of a Graph Addict: Despair to Redemption

“The founders of modern democracies provided a moral heritage thatremains highly regarded in societies today. However, it should beremembered that it is the ideals that are valuable, not the specificimplementation of the systems that protect and support them. Ifthere is another implementation of government that better realizesthese ideals, then, by the rights of man, it must be enacted.”17

– Michael Scott

17Rodriguez, M.A., Watkins, J.H., “Revisiting the Age of Enlightenment from a Collective DecisionMaking Systems Perspective,” First Monday, 14(8), University of Illinois at Chicago Library, 2009.[http://arxiv.org/abs/0901.3929]

Page 59: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Seeking Virtue through Circuitry

The year is 2018.

Page 60: Memoirs of a Graph Addict: Despair to Redemption

Human life on earth has stabilized.

Page 61: Memoirs of a Graph Addict: Despair to Redemption

Humans no longer struggle to survive. Theystruggle for eudaemonia. They seek the “gooddaemon” within...

Page 62: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Artistotle

• Being virtuous is repeatedly choosing correctly.

• Habitual correct behavior leads to eudaemonia – complete engagement in the world

(a complete sense of engagement/acceptance).18 19

• Can systems aid individuals in choosing correctly – in all aspects of life?

David L. NortonAristotle

18Aristotle, “Nicomachean Ethics”, 350 B.C.

19Mihaly Csikszentmihalyi, “Flow: The Psychology of Optimal Experience”, Harper Perennial, 1990.

Page 63: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Resource Modeling

But if the development of character is a the moral objective, it is obvious that

[...] the choices of vocation and avocations to pursue, of friends to cultivate, of

books to read are moral for they clearly influence such development.20

• Web services are continuing to build richer models of humans, resources,and the relationships between them.

• There exists an increasing reliance on such services to aid in decisionmaking: correct books (Amazon.com), correct movies (NetFlix.com),correct music (Pandora), correct occupation (Monster.com), correctfriends (PointsCommuns.com), correct life partner (Match.com), etc.21

20David L. Norton, “Democracy and Moral Development: A Politics of Virtue”, University of California Press, 1991.

21Rodriguez, M.A., Watkins, J., “Faith in the Algorithm, Part 2: Computational Eudaemonics,” Proceedings of the

International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, 5712, pp. 813–820, 2009.

[http://arxiv.org/abs/0904.0027]

Page 64: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Mapping Person to Resource

person

movie

article

music

friend

food

watch

read

listen

meet

eat

time

Map an individual to actions on resources. However, how do wemodel/expose the resources of the world?

Page 65: Memoirs of a Graph Addict: Despair to Redemption

Model

Page 66: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: The Web of Data

geospecies

freebase

dbpedia

libris

geneid

interpro

hgnc

symbol

pubmed

mgi

geneontology

uniprot

pubchem

unists

omim

homologene

pfam

pdb

reactome

chebi

uniparc

kegg

cas

uniref

prodomprosite

taxonomy

dailymed

linkedct

acm

dblprkbexplorer

laascnrs

newcastle

eprints

ecssouthampton

irittoulouseciteseer

pisa

resexibm

ieee

rae2001

budapestbme

eurecom

dblphannover

diseasome

drugbank

geonames

yago

opencyc

w3cwordnet

umbel

linkedmdb

rdfbookmashup

flickrwrappr

surgeradio

musicbrainz myspacewrapper

bbcplaycountdata

bbcprogrammes

semanticweborg

revyu

swconferencecorpus

lingvoj

pubguide

crunchbase

foafprofiles

riese

qdos

audioscrobbler

flickrexporter

bbcjohnpeel

wikicompany

govtrack

uscensusdata

openguides

doapspace

bbclatertotp

eurostat

semwebcentral

dblpberlin

siocsites

jamendo

magnatuneworldfactbook

projectgutenberg

opencalais

rdfohloh

virtuososponger

geospecies

freebase

dbpedia

libris

geneid

interpro

hgnc

symbol

pubmed

mgi

geneontology

uniprot

pubchem

unists

omim

homologene

pfam

pdb

reactome

chebi

uniparc

kegg

cas

uniref

prodomprosite

taxonomy

dailymed

linkedct

acm

dblprkbexplorer

laascnrs

newcastle

eprints

ecssouthampton

irittoulouseciteseer

pisa

resexibm

ieee

rae2001

budapestbme

eurecom

dblphannover

diseasome

drugbank

geonames

yago

opencyc

w3cwordnet

umbel

linkedmdb

rdfbookmashup

flickrwrappr

surgeradio

musicbrainz myspacewrapper

bbcplaycountdata

bbcprogrammes

semanticweborg

revyu

swconferencecorpus

lingvoj

pubguide

crunchbase

foafprofiles

riese

qdos

audioscrobbler

flickrexporter

bbcjohnpeel

wikicompany

govtrack

uscensusdata

openguides

doapspace

bbclatertotp

eurostat

semwebcentral

dblpberlin

siocsites

jamendo

magnatuneworldfactbook

projectgutenberg

opencalais

rdfohloh

virtuososponger

Page 67: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: URIs of the Web of Data

http://dbpedia.org/resource/The Fountainhead

dbpedia:Ayn_Rand

dbpedia:Fountain_Head

flickr:Ayn_Rand

dbpedia:Bookdbpedia:author

rdf:type

dbpprop:hasPhotoCollection

http://www4.wiwiss.fu-berlin.de/flickrwrappr/photos/Ayn_Rand

foaf:depiction

DBPEDIA

FLICKR

Page 68: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Datasets on the Web of Datadata set domain data set domain data set domain

audioscrobbler music govtrack government pubguide booksbbclatertotp music homologene biology qdos socialbbcplaycountdata music ibm computer rae2001 computerbbcprogrammes media ieee computer rdfbookmashup booksbudapestbme computer interpro biology rdfohloh socialchebi biology jamendo music resex computercrunchbase business laascnrs computer riese governmentdailymed medical libris books semanticweborg computerdblpberlin computer lingvoj reference semwebcentral socialdblphannover computer linkedct medical siocsites socialdblprkbexplorer computer linkedmdb movie surgeradio musicdbpedia general magnatune music swconferencecorpus computerdoapspace social musicbrainz music taxonomy referencedrugbank medical myspacewrapper social umbel generaleurecom computer opencalais reference uniref biologyeurostat government opencyc general unists biologyflickrexporter images openguides reference uscensusdata governmentflickrwrappr images pdb biology virtuososponger referencefoafprofiles social pfam biology w3cwordnet referencefreebase general pisa computer wikicompany businessgeneid biology prodom biology worldfactbook governmentgeneontology biology projectgutenberg books yago generalgeonames geographic prosite biology . . .

Page 69: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Transforms Development

A new application development paradigm emerges. No longer do data and application

providers need to be the same entity (left). With the Web of Data, its possible for

developers to write applications that utilize data that they do not maintain (right).22

Web of Data

127.0.0.1 127.0.0.2 127.0.0.3

Application 1 Application 2 Application 3

structures structuresstructures

processes processes processes

127.0.0.1 127.0.0.2 127.0.0.3

Application 1 Application 2 Application 3

structures structures structures

processes processes processes

22Rodriguez, M.A., “A Reflection on the Structure and Process of the Web of Data,” Bulletin of the American Society for

Information Science and Technology, 35(6), pp. 38–43, 2009. [http://arxiv.org/abs/0908.0373]

Page 70: Memoirs of a Graph Addict: Despair to Redemption

Now that there is a rich structure, what is theprocess?

Page 71: Memoirs of a Graph Addict: Despair to Redemption

Process

Page 72: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Diffusion Processes on Graphs

A graph diffusion process will be used to determine the solution to one’sproblems.

• Graph traversing can be seen as a diffusion process over a graph.

• “Energy” moves over a graph and reverberates in regions where thereis recurrence (i.e. cycles).

• At some t in the future, the vertices with the greatest flow are thesolution to the problem.

Page 73: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Diffusion Processes on Graphs

Page 74: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Diffusion Processes on Graphs

Page 75: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Diffusion Processes on Graphs

Page 76: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Diffusion Processes on Graphs

Page 77: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Diffusion Processes on Graphs

Page 78: Memoirs of a Graph Addict: Despair to Redemption

Implementing a diffusion process is easy when the edges of thegraph are unlabeled.

flow = new HashMap<Vertex,Integer>();

current = Arrays.asList(startVertex);

steps = 10;

for(int i=0; i<steps; i++) {

current = current.collect{ it.getAdjacentVertices() }

current.each{ flow[it] = flow[it] + 1 }

}

Page 79: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Diffusion on a Property Graph?

marko

24

jen

The Wire

linkedprocess

intelligence graphs

peter

occupationoccupation

likes wrote

occupation

likes

likeslikes

wrote

knowsknows

gremlin

wrote

emil

knows

likes

True Blood

likes

likes

tagged

With different types of things being related by different types of relations,you need to specify legal paths for the energy to flow over.

Page 80: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Diffusion on a Property Graph

• Problem statement = Start vertices + path expression.

• Problem solution = Highest energy vertices at t.23 24 25

23Examples presented next are basic due to the simplicity of the toy graph example used. In such cases,queries as opposed to energy diffusions are best. In general, the purpose of an energy diffusion is toexpose recurrence/feedback in the graph. For the more technically inclined, think of it as determining theeigenvector of the graph defined by the path expression.

24Rodriguez, M.A., “Grammar-Based Random Walkers in Semantic Networks,” Knowledge-Based Systems,21(7), pp. 727–739, 2008. [http://arxiv.org/abs/0803.4355]

25Rodriguez, M.A., Neubauer, P., “A Path Algebra for Multi-Relational Graphs,” 2nd InternationalWorkshop on Graph Data Management (GDM11), 2010. [http://arxiv.org/abs/1011.0390]

Page 81: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Friend Recommendation

marko

24

jen

The Wire

linkedprocess

intelligence graphs

peter

occupationoccupation

likes wrote

occupation

likes

likeslikes

wrote

knowsknows

gremlin

wrote

emil

knows

likes

True Blood

likes

likes

tagged

“Who are my friends’ friends that are not me or my friends?”26

26marko.outE[[label:’knows’]].inV.aggregate(x).outE.inV{!x.contains(it)}

Page 82: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Product Recommendation

marko

24

jen

The Wire

linkedprocess

intelligence graphs

peter

occupationoccupation

likes wrote

occupation

likes

likeslikes

wrote

knowsknows

gremlin

wrote

emil

knows

likes

True Blood

likes

likes

tagged

“Who likes what I like? Of those things they like, what else do they likethat I don’t already like?”27

27marko.outE[[label:’likes’]].inV.aggregate(x).inE[[label:’likes’]].outV.outE[[label:’likes’]].inV{!x.contains(it)}

Page 83: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Product Recommendation 2

marko

24

jen

The Wire

linkedprocess

intelligence graphs

peter

occupationoccupation

likes wrote

occupation

likes

likeslikes

wrote

knowsknows

gremlin

wrote

emil

knows

likes

True Blood

likes

likes

tagged

“Who likes what I like and what do they like? What do the people I knowlike? Of those things liked, what do I not already like?”

Page 84: Memoirs of a Graph Addict: Despair to Redemption

Eudaemonic Engine: Recommendation

• Different paths through a domain model expose different types ofrecommendations.

• Individual path preferences allow for an ecosystem of traversals (differentproblems can be solved over the same domain model).28 29 30

28Rodriguez, M.A., Allen, D.W., Shinavier, J., Ebersole, G., “A Recommender System to Support theScholarly Communication Process,” 2009. [http://arxiv.org/abs/0905.1594]

29Rodriguez, M.A., “Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, andRecommendation,” Technical Talk Seminar, AT&T Interactive, 2010.[http://slidesha.re/bOCy4Q]

30Traversal Patterns with Gremlin available at https://github.com/tinkerpop/gremlin/wiki/

Traversal-Patterns.

Page 85: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: A Single Computational Substrate

The year is 2023.

Page 86: Memoirs of a Graph Addict: Despair to Redemption

Life is good. Humans flourish. Virtuous men’s minds are filledwith wonderfully creative ideas. Inventions proliferate.

Page 87: Memoirs of a Graph Addict: Despair to Redemption

Advances in computer network technology yield anew model of computing.

Computer networks are no longer the bottleneck forspeed. Accessing local and remote data is no longerconsidered “different.” The distinction betweenRAM, disk drive, and Web disappears.

Page 88: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: A Computational Substrate

On the Web...

• Represent data.

• Represent code.

• Represent virtual machines.

Page 89: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: Represent Data

• URIs form an infinite universal address space.

• A URI can denote a datum.

? http://markorodriguez.com#self (Marko)? http://sws.geonames.org/4887398/about.rdf (Chicago)? http://data.nytimes.com/N38395718310308503251 (Malmo)

• RDF (Resource Description Framework) is a data model for linking URIsinto a multi-relational graph.

Page 90: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: Represent Data

127.0.0.1127.0.0.2

atti:marko nm:puppyatti:bestFriend

"2"^^xsd:integer "false"^^xsd:boolean

atti:numberOfLegsatti:hasFur

atti:numberOfLegsatti:hasFur

"4"^^xsd:integer "true"^^xsd:boolean

• The concept of atti:marko and the properties atti:numberOfLegs, atti:hasFur,

and atti:bestFriend is maintained by AT&Ti graph server.

• The concept of nm:puppy is maintained by a New Mexico graph server.

• The data types of xsd:integer and xsd:boolean are maintained by XML standards

organization.

Page 91: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: Represent Code

• Computing is a series of instructions — add, write, branch, goto...

• The URI address space and RDF glue can be seen as computationalmedium.31

_:123

"3"^^xsd:int "7"^^xsd:int

atti:Add

atti:left-op atti:right-op

rdf:type

rdf:subClassOf

atti:Instruction

31Rodriguez, M.A., “General-Purpose Computing on a Semantic Network Substrate,” Emergent WebIntelligence: Advanced Semantic Technologies, eds. R. Chbeir, A. Hassanien, A. Abraham, and Y. Badr, pp.57–104, 2010. [http://arxiv.org/abs/0704.3395]

Page 92: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: Represent Code

atti:marko nm:puppyatti:bestFriend

atti:pet

atti:hasMethod atti:isHappy

"false"^^xsd:boolean

_:1234

atti:argsatti:block

"animal"^^xsd:string

rdf:1

_:2345

_:3456

atti:inst

// make animal happy

Method

Represent methods and their instructions attached to objects/classes.

Page 93: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: Represent Virtual Machines

atti:marko nm:puppyatti:bestFriend

atti:pet

atti:hasMethod atti:isHappy

"false"^^xsd:boolean

atti:block

_:2345

_:3456

atti:inst

_:6789 atti:pc

atti:VM

rdf:type

Virtual Machine

write "true"^^xsd:boolean

Represent not only code, but the machines that execute it.

Page 94: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: Represent Virtual Machines

halt

Fhat

Instruction

programLocation

Frame

hasFrame

[0..*]

[0..1]

returnTop

ReturnStack

Instruction

rdf:firstrdf:rest

[0..1][0..1]

blockTop

[0..*]

FrameVariable

rdf:li

hasValue

rdfs:Resource

operandTop

OperandStack

rdfs:Resource

rdf:firstrdf:rest

[0..1]

[0..1]

[0..1]

RVM

[0..*]

hasSymbol

xsd:string

[1]

xsd:boolean[1]

forFrame[1]

fromBlock

Block

[1]

currentFrame

[0..1]

methodReuse

xsd:boolean[1]

[0..1]

BlockStack

Block

rdf:firstrdf:rest

[0..1]

[0..1]

[0..1]

NenoFhat Project (circa 2006): http://neno.lanl.gov.

Page 95: Memoirs of a Graph Addict: Despair to Redemption

API

Program

Machine Architecture

Virtual Machine State

Virtual Machine Processes

Physical Machines...

read/write read/write

Physics

Global Data Structure

127.0.0.1 127.0.0.4127.0.0.2 127.0.0.3

My Belief in Reality

Data

Page 96: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: A Ramification

• Data, APIs, code, machine architectures, and virtual machines are withinthe same global URI address space.

? Code can by physically distributed across computers. For example,an add instruction on 127.0.0.1 references a branch instruction on127.0.0.2.

? Hardware machines can be added or removed without altering thestate of computation — only the speed.

? No developer concept of RAM-based memory addresses — the onlyaddress space is the space of all URIs.

Page 97: Memoirs of a Graph Addict: Despair to Redemption

Universal Computer: Another Ramification

• Reflection down to the machine level.32

? Most languages support the manipulation of code at runtime. In thismodel, the virtual machine can be altered at runtime.

? Code can rewrite the virtual machine that is evaluating thecode. (i.e. create lots of bugs.)

32Rodriguez, M.A., The RDF Virtual Machine, LA-UR-08-03925, in review, 2009. [http://arxiv.org/abs/0802.3492]

Page 98: Memoirs of a Graph Addict: Despair to Redemption

The year is 2030.

Page 99: Memoirs of a Graph Addict: Despair to Redemption

Man learns to encode themselves into the URIaddress space...33 34

33Egan, G., “Permutation City,” Eos Publisher, 1995.34Rodriguez, M.A., “From the Signal to the Symbol: Structure and Process in Artificial Intelligence,”

Center for Nonlinear Studies Post Doctorate Seminar, Los Alamos National Laboratory, Los Alamos, NewMexico, 2008. [http://slidesha.re/hdqRn2]

Page 100: Memoirs of a Graph Addict: Despair to Redemption

Outline

• Graph Structures

• Graph Databases

• Graph Applications

• TinkerPop Product Suite

Page 101: Memoirs of a Graph Addict: Despair to Redemption

This is the TinkerPop...

Page 102: Memoirs of a Graph Addict: Despair to Redemption

TinkerPop Productions

• Blueprints: Data Models and their Implementations

[http://blueprints.tinkerpop.com]

• Pipes: A Data Flow Framework using Process Graphs

[http://pipes.tinkerpop.com]

• Gremlin: A Graph-Based Programming Language

[http://gremlin.tinkerpop.com]

• Rexster: A RESTful Graph Shell

[http://rexster.tinkerpop.com]35

35Please see http://engineering.attinteractive.com/2010/12/a-graph-processing-stack/ fora short review of these products.Also TinkerPop’s homepage at: http://tinkerpop.com

Page 103: Memoirs of a Graph Addict: Despair to Redemption

Blueprints: A Property Graph Model Interface

Blueprints

• Blueprints is the like the JDBC of the graph database community.

• Provides a Java-based interface API for the property graph data model.

? Graph, Vertex, Edge, Index.

• Connectors to TinkerGraph, Neo4j, OrientDB, Sails (e.g. AllegroGraph,HyperSail, etc.), and soon InfiniteGraph. Into the future, hope to supportInfoGrid, Sones, DEX, and HyperGraphDB.36

36HyperGraphDB makes use of an n-ary graph structure known as a hypergraph. Blueprints, in its currentform, only supports the more common binary graph.

Page 104: Memoirs of a Graph Addict: Despair to Redemption

Creating a Neo4jGraph in Blueprints// create a graph

Graph graph = new Neo4jGraph("/tmp/neo4j");

// add two vertices

Vertex a = graph.addVertex(null);

a.setProperty("name","marko");

Vertex b = graph.addVertex(null);

b.setProperty("name","peter");

// join the two vertices by a knows relation

Edge e = graph.addEdge(null,a,b,"knows");

e.setProperty("since","2007");

0 1knows

name=marko name=petersince=2007

Page 105: Memoirs of a Graph Addict: Despair to Redemption

Handy Features of Blueprints

• Supports automatic transactions

? graph.setTransactionMode(AUTOMATIC -or- MANUAL)

? In automatic mode, every manipulation of the graph is wrapped in atransaction and committed.

• Supports automatic indices

? graph.createIndex(AUTOMATIC -or- MANUAL)

? In automatic mode, elements are added or removed from an index astheir properties are manipulated.

• Utility Suite

? Blueprints Sail makes a graphdb into a traversal-based RDF store.? GraphML Reader/Writer library.

Page 106: Memoirs of a Graph Addict: Despair to Redemption

Pipes: A Data Flow Framework using Process Graphs

Pipes

• Lazy data flow with support for Blueprints-based graph processing.

• Provides a collection of “pipes” (implement Iterable and Iterator)that are connected together to form processing pipelines.

? Filters: ComparisonFilterPipe, RandomFilterPipe, etc.? Traversal: VertexEdgePipe, EdgeVertexPipe, PropertyPipe, etc.? Splitting/Merging: CopySplitPipe, RobinMergePipe, etc.? Logic: OrFilterPipe, AndFilterPipe, etc.

Page 107: Memoirs of a Graph Addict: Despair to Redemption

Pipes: Chained Iterators

This pipeline takes objects of type A and turns them into objects of type D

through a sequence of processing pipes...37

Pipe1A B Pipe2 C Pipe3 D

Pipeline

A

AA

A

D

DD

D

Pipe<A,D> pipeline =

new Pipeline<A,D>(Pipe1<A,B>, Pipe2<B,C>, Pipe3<C,D>)

37Though not discussed, splitting and merging is allowed as well (branching pipelines).

Page 108: Memoirs of a Graph Addict: Despair to Redemption

Pipes: A Simple Example

“What are the names of the people that marko knows?”

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

name=gremlin

Page 109: Memoirs of a Graph Addict: Despair to Redemption

Pipes: A Simple Example

Pipe<Vertex,Edge> pipe1 = new VertexEdgePipe(Step.OUT_EDGES);

Pipe<Edge,Edge> pipe2= new LabelFilterPipe("knows",Filter.NOT_EQUAL);

Pipe<Edge,Vertex> pipe3 = new EdgeVertexPipe(Step.IN_VERTEX);

Pipe<Vertex,String> pipe4 = new PropertyPipe<String>("name");

Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4);

pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A"));

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

name=gremlin

Page 110: Memoirs of a Graph Addict: Despair to Redemption

Pipes: A Simple Example

for(String name : pipeline) {

System.out.println(name);

}

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

name=gremlin

peter

pavel

Page 111: Memoirs of a Graph Addict: Despair to Redemption

Pipes: A Simple Example

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

VertexEdgePipe(OUT_EDGES)

LabelFilterPipe("knows")

EdgeVertexPipe(IN_VERTEX)

PropertyPipe("name")

name=gremlin

Page 112: Memoirs of a Graph Addict: Despair to Redemption

Pipes: A Simple Example

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

VertexEdgePipe(OUT_EDGES)

LabelFilterPipe("knows")

EdgeVertexPipe(IN_VERTEX)

PropertyPipe("name")

name=gremlin

Page 113: Memoirs of a Graph Addict: Despair to Redemption

Pipes: A Simple Example

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

VertexEdgePipe(OUT_EDGES)

LabelFilterPipe("knows")

EdgeVertexPipe(IN_VERTEX)

PropertyPipe("name")

name=gremlin

Page 114: Memoirs of a Graph Addict: Despair to Redemption

Pipes: A Simple Example

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

VertexEdgePipe(OUT_EDGES)

LabelFilterPipe("knows")

EdgeVertexPipe(IN_VERTEX)

PropertyPipe("name")

name=gremlin

Page 115: Memoirs of a Graph Addict: Despair to Redemption

Pipes: Library of Generally Useful Pipes

[ FILTERS ]

AndFilterPipe

CollectionFilterPipe

ComparisonFilterPipe

DuplicateFilterPipe

FutureFilterPipe

ObjectFilterPipe

OrFilterPipe

RandomFilterPipe

RangeFilterPipe

[ SPLITS ]

CopySplitPipe

RobinSplitPipe

[ MERGES ]

ExhaustiveMergePipe

RobinMergePipe

[ GRAPHS ]

EdgeVertexPipe

IdFilterPipe

IdPipe

LabelFilterPipe

LabelPipe

PropertyFilterPipe

PropertyPipe

VertexEdgePipe

[ SIDEEFFECTS ]

AggregatorPipe

CountCombinePipe

CountPipe

KeyCombinePipe

SideEffectCapPipe

[ UTILITIES ]

DynamicStartsPipe

GatherPipe

PathPipe

PrintStreamPipe

ProductPipe

ScatterPipe

TypeCastPipe

Pipeline

...

Page 116: Memoirs of a Graph Addict: Despair to Redemption

Pipes: Easy to Create New Pipes

public class NumCharsPipe extends AbstractPipe<String,Integer> {

public Integer processNextStart() {

String word = this.starts.next();

return word.length();

}

}

When extending the base class AbstractPipe<S,E> all that is required isan implementation of processNextStart().

Page 117: Memoirs of a Graph Addict: Despair to Redemption

Pipes: Easy to Create New Pipes

domain specific

complex traversalalgorithms

com.tinkerpop.pipes

Most of my projects are composedof lots of application specific Pipes.That is, Pipes that are specific tomy domain model and yield usefuljumps in the graph. For example,

SameLikesPipe<Vertex,Vertex>.

From these domain specific Pipes,complex algorithms are createdthrough the piecing together ofthose Pipes. For example,

RecommenderPipe<Vertex,Map>.

Page 118: Memoirs of a Graph Addict: Despair to Redemption

Gremlin: A Graph-Based Programming Language

GremlinG = (V,E)

• A graph traversal language that uses Groovy as its host language.

• Compiles Gremlin syntax down to Pipes (implements JSR 223).38

38At the time of this presentation, Gremlin’s most recent stable release is 0.6 which is a standalonelanguage. To increase the flexibility of the language, 0.7-SNAPSHOT+ boasts the use of Groovy as the hostthe language.

Page 119: Memoirs of a Graph Addict: Despair to Redemption

Gremlin: Easily Compose Graph Related Pipes

Pipes is verbose...

Pipe<Vertex,Edge> pipe1 = new VertexEdgePipe(Step.OUT_EDGES);

Pipe<Edge,Edge> pipe2 = new LabelFilterPipe("knows",Filter.NOT_EQUAL);

Pipe<Edge,Vertex> pipe3 = new EdgeVertexPipe(Step.IN_VERTEX);

Pipe<Vertex,String> pipe4 = new PropertyPipe<String>("name");

Pipe<Vertex,String> pipeline = new Pipeline(pipe1,pipe2,pipe3,pipe4);

pipeline.setStarts(new SingleIterator<Vertex>(graph.getVertex("A"));

...relative to Gremlin.

g.v(‘A’).outE[[label:‘knows’]].inV.name

Page 120: Memoirs of a Graph Addict: Despair to Redemption

Gremlin: The Simple Example

A C

B

D

knows

knows

created

createdname=marko

name=peter

name=pavel

outE

[[label:'knows']]

inV

name

name=gremlin

g.v('A')

Page 121: Memoirs of a Graph Addict: Despair to Redemption

Gremlin: Defining a Step

“Who likes the same things that I like?”

Vertex.metaClass.same_like =

{ _().outE[[label:‘likes’]].inV.inE[[label:‘likes’]].outV }

A C

B

D

likes

likes

likes

G

E

F

likes

likes

likes

likes

Page 122: Memoirs of a Graph Addict: Despair to Redemption

Gremlin: Defining a Stepgremlin> g.v(‘A’).same_likes

==>v[E]

==>v[F]

==>v[F]

==>v[G]

A C

B

D

likes

likes

likes

G

E

F

likes

likes

likes

likes

Page 123: Memoirs of a Graph Addict: Despair to Redemption

Gremlin: Defining a Step

gremlin> m = g:id-v(‘A’).same_likes.group_count >> 1

gremlin> m

==>v[E]=1

==>v[F]=2

==>v[G]=1

v[F] is most similar, in terms of likes, to v[A].39

39For a thorough review of such traversal patterns, please see: Rodriguez, M.A., “Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Recommendation,” July 2010.[http://slidesha.re/bOCy4Q]

Page 124: Memoirs of a Graph Addict: Despair to Redemption

Rexster: A RESTful Graph Shell

reXster

• Allows Blueprints graphs to be exposed through a RESTful API (HTTP).

• All communication is via JSON.

• Supports stored traversals written in raw Pipes or Gremlin.

• Supports adhoc traversals represented in Gremlin.

• Provides “helper classes” for performing search-, score-, and rank-basedtraversal algorithms—in concert, support for recommendation.

Page 125: Memoirs of a Graph Addict: Despair to Redemption

Rexster: URI Patterns

• http://localhost/graph/vertices: all the vertices in the graph

• http://localhost/graph/vertices/1: vertex with id 1 in the graph.

• http://localhost/graph/vertices/1/outE: outgoing edges ofvertex with id 1.

{ "results": {

"_type":"vertex",

"_id":"1",

"name":"aaron",

"type":"person"

},

"query_time":0.1537 }

Page 126: Memoirs of a Graph Addict: Despair to Redemption

Typical TinkerPop Graph Stack

NativeStore TinkerGraphNeo4j

GET http://{host}/{resource}

Page 127: Memoirs of a Graph Addict: Despair to Redemption

Conclusion

• Property graphs are convenient structures for modeling the real-world.

• Graph databases provide index-free adjacency to ensure speedytraversal over graphs.

• The graph is such a general data structure that it can be used fornumerous applications.

• TinkerPop provides a database agnostic stack of technologies forworking with property graphs.

Page 128: Memoirs of a Graph Addict: Despair to Redemption

Acknowledgements

• Research collaborators: Daniel Steinbock (Stanford), Jennifer H.Watkins (LANL), Alberto Pepe (Harvard), Joshua Shinvaier (RPI), JohanBollen (LANL), Herbert Van de Sompel (LANL).

• TinkerPop contributors: Pavel Yaskevich (Riptano), Stephen Mallete(Independent), Darrick Weibe (Independent), Alex Averbuch (SwedishInstitute of CS), Peter Neubauer (Neo4j).

• Others: Emil Eifrem (Neo4j), Luca Garulli (Orient Technologies), AaronPatterson (AT&Ti).

Page 129: Memoirs of a Graph Addict: Despair to Redemption

http://tinkerpop.spreadshirt.com