Titan and Cassandra at WellAware

38
Titan and Cassandra at WellAware Ted Wilmes [email protected]

Transcript of Titan and Cassandra at WellAware

Page 1: Titan and Cassandra at WellAware

Titan and Cassandra at WellAware

Ted [email protected]

Page 2: Titan and Cassandra at WellAware

Topics

● The property graph model

● The graph ecosystem

● Titan overview

● Titan at WellAware

Page 3: Titan and Cassandra at WellAware

Property graph model

label: trucklicense: ABC123year: 2013

label: personfirstName: Susan

label: companyname: Acme Trucks

ownsbought: 2012

employshired: 2012

drives

Page 4: Titan and Cassandra at WellAware

Sampling of graph projects/vendors

Page 5: Titan and Cassandra at WellAware

Apache Tinkerpop - tying it all together!

● Gremlin Server○ Remote access to Tinkerpop

compliant graph dbs for JVM & non-JVM clients

● Gremlin○ Graph query and processing

language● Core API

○ add vertex○ add edge○ add/update properties○ simple queries (adjacent

edge/vertex retrieval)http://tinkerpop.incubator.apache.org/

Page 6: Titan and Cassandra at WellAware

Gremlin in actionvehiclelicense: ABC123year: 2013

personfirstName: Susan

companyname: Acme Trucks

personfirstName: Tom

employshired: 2014

employshired: 2012

ownsbought: 2012

● Add vertices and edges

● Retrieving vertices● Basic vertex filtering● Querying adjacent

edges and verticesdrives

Page 7: Titan and Cassandra at WellAware

Building the graph

// Add vertices

graph = TitanFactory.open('conf/titan-cassandra.properties')

acmeTrucks = graph.addVertex(T.label, "company", "name", "Acme Trucks")

susan = graph.addVertex(T.label, "person", "firstName", "Susan")

tom = graph.addVertex(T.label, "person", "firstName", "Tom")

truck = graph.addVertex(T.label, "vehicle", "license", "ABC123", "year", 2012)

// Connect vertices with edges

edge = acmeTrucks.addEdge("owns", truck)

edge.property("bought", 2012)

acmeTrucks.addEdge("employs", susan).property("hired", 2012)

acmeTrucks.addEdge("employs", tom).property("hired", 2014)

tom.addEdge("drives", truck)

Page 8: Titan and Cassandra at WellAware

Retrieving vertices

// Get a traverser so that we can run some queries

g = graph.traversal(standard())

gremlin> g.V()

==>v[0]

==>v[2]

==>v[4]

==>v[6]

// Get the properties for each vertex

gremlin> g.V().valueMap()

==>[name:[Acme Trucks]]

==>[firstName:[Susan]]

==>[firstName:[Tom]]

==>[license:[ABC123], year:[2012]]

Page 9: Titan and Cassandra at WellAware

Basic vertex filtering

// Retrieve all people with firstName Susan

gremlin> g.V().hasLabel("person").has("firstName", "Susan")

==>v[2]

// Retrieve all people with firstName Susan or Tom

gremlin> g.V().hasLabel("person").has("firstName", within("Susan", "Tom"))

==>v[2]

==>v[4]

Page 10: Titan and Cassandra at WellAware

Querying adjacent edges and vertices

// Count how many people Acme Trucks employs

gremlin> g.V().hasLabel("company").has("name", "Acme Trucks").out("employs").count()

==>2

// How many employees were hired in 2012?

gremlin> g.V().hasLabel("person").where(inE("employs").has("hired", 2012)).count()

==>1

// Which employees drives a truck?

gremlin> g.V().hasLabel("company").has("name", "Acme Trucks").out("employs").as("driver").out("drives").select("driver").values("firstName")

==>Tom

// Show me all of the drivers that were hired before 2015

gremlin> g.V().hasLabel("person").and(inE("employs").values("hired").is(lt(2015)), out("drives")).values("firstName")

==>Tom

Page 11: Titan and Cassandra at WellAware

Many more steps...

● AddEdge Step● AddVertex Step● AddProperty Step● Aggregate Step● And Step● As Step● By Step● Cap Step● Coalesce Step● Count Step● Choose Step● Coin Step

● CyclicPath Step● Dedup Step● Drop Step● Fold Step● Group Step● GroupCount Step● Has Step● Inject Step● Is Step● Limit Step● Local Step● Match Step● ...

Page 12: Titan and Cassandra at WellAware

GraphComputer for global graph processing

● Use cases○ full graph traversal○ parallel processing○ batch import/export

● Examples○ PageRank○ vertex count○ mass schema update

● Gremlin OLAP implementations○ Hadoop○ Spark○ Giraph

Page 13: Titan and Cassandra at WellAware

Graph use cases

● Social network analysis

● Fraud detection

● Recommendation systems

● Route optimization

● IoT

● Master data management

Page 14: Titan and Cassandra at WellAware
Page 15: Titan and Cassandra at WellAware

TitanDB

● What is Titan?

● Data store options

● Deployment options

● Titan Cassandra data model

● Titan specific graph features

Page 16: Titan and Cassandra at WellAware

TitanDB

● Graph layer that can use a variety of data stores as backends depending on user requirements○ HBase○ Berkeley DB○ Cassandra○ Insert your favorite k/v, BigTable data store

Page 17: Titan and Cassandra at WellAware

Which data store is right for you?

● Things to think about○ data volume ○ CAP○ ACID○ read/write requirements○ ops implications○ your current infrastructure

http://s3.thinkaurelius.com/docs/titan/0.5.4/benefits.html

Page 18: Titan and Cassandra at WellAware

Socket

JVM

Node

JVM

Node

Embedded

JVM

Page 19: Titan and Cassandra at WellAware

A Titan cluster with access options

TitanC*

TitanC*

TitanC*

TitanC*

TitanC*

● Access options○ Titan < 0.9

■ Rexster■ dependency of your app

○ Titan 0.9+■ Gremlin server■ dependency of your app

○ Object to graph mapper■ Python - Mogwai, Bulbs■ JVM - Totorom, Frames

● Titan does not need to be on each node, all communication between Titan instances is through C*

Page 20: Titan and Cassandra at WellAware

Titan installation

● Download and unzip latest milestone● Cassandra footprint

○ Titan keyspace○ Column families

■ edgestore■ edgestore_lock_■ graphindex■ graphindex_lock_■ titan_ids■ ...

./bin/titan.sh start

Forking Cassandra...

Running `nodetool statusthrift`.. OK (returned exit status 0 and printed string "running").

Forking Elasticsearch...

Connecting to Elasticsearch (127.0.0.1:9300). OK (connected to 127.0.0.1:9300).

Forking Gremlin-Server...

Connecting to Gremlin-Server (127.0.0.1:8182)...... OK (connected to 127.0.0.1:8182).

Run gremlin.sh to connect.

Page 21: Titan and Cassandra at WellAware

Vertex and edge storage format

Cassandra Thrift

Titan storage format

Page 22: Titan and Cassandra at WellAware

Edge and property serialization

Page 23: Titan and Cassandra at WellAware

Schema definition

● Properties○ data type - string, float, char, geoshape, etc.○ cardinality - single, list, set○ uniqueness (through Titan’s indexing system)

● Edges○ labels○ define multiplicity - one-to-one, many-to-one, one-to-many

● Vertices○ labels

● Advanced○ edge, vertex, and property TTL○ Multi-properties - properties on properties (audit info for example)

Page 24: Titan and Cassandra at WellAware

Global indexing options

● Supports composite keys● Titan indexing provider

○ fast!○ exact matches only

● External providers○ Not as fast○ Many options beyond exact

matching (wildcards, geosearch, etc.)

○ providers■ Elastic Search■ Lucene■ Solr

I want that one!

Page 25: Titan and Cassandra at WellAware

Vertex Centric Indices

● Adjacent edge counts can grow quite large in certain situations and form super nodes

● Supports composite keys and ordering of edges to speed up vertex centric queries○ translates into slice queries of

the edges○ efficiently retrieve ranges of

edges or satisfy top n type queries

companyname: Acme Trucks

employshired: 2013

employshired: 2014

employshired: 2015

Page 26: Titan and Cassandra at WellAware

Graph partitioning with ByteOrderedPartioner

?

Page 27: Titan and Cassandra at WellAware

Vertex cuts

Supernode ...

1

2,000,000

mgmt = graph.openManagement()mgmt.makeVertexLabel('user').partition().make()mgmt.commit()

}

}

Page 28: Titan and Cassandra at WellAware

Edge cuts

Company A

Company B

Page 29: Titan and Cassandra at WellAware

@

Page 30: Titan and Cassandra at WellAware

A bit more about WellAware

● Founded in 2012● Full stack oil & gas monitoring solution● iOS, Android, and web clients● Connecting to field assets over RPMA, cellular, and

satellite

Page 31: Titan and Cassandra at WellAware

Functionality and high level architecture● Remote data collection● Mobile data collection● Asset control● Derived measurements● Alarming● Reporting

Poller DjangoTitan

WAN ESB

Page 32: Titan and Cassandra at WellAware

Moving to Titan

● 2013○ Running Django against PostgreSQL and for awhile, TempoDB

● Beginning of 2014 - started using Titan 0.4.4 to capture relationships between assets and for derived measurements

● March 2014 - deployed a 3 node Cassandra cluster and moved the rest of the backend (minus auth) over to Titan 0.4.4

● Today - 3 node DC for OLTP & 2 node reporting DC○ still on Titan 0.4.4, waiting for Titan 1.0 to be released and hardened○ post Titan 1.0, we’re looking forward to trying out DSE Graph

Page 33: Titan and Cassandra at WellAware

A common well pad configuration

Well & pumpjack

Tanks

Page 34: Titan and Cassandra at WellAware

Sample of model

O&G Co.

TankSite

Top Gauge

Page 35: Titan and Cassandra at WellAware

Zooming in on a well pad

wellmeter separator

meter

tank

tank

compressor

Page 36: Titan and Cassandra at WellAware

Lessons learned

● No native integration with 3rd party BI tools - reports, dashboards, ad hoc query○ Apache Calcite based jdbc driver that translates SQL to graph queries

● Colocation of Titan, some of your application code, and Cassandra on the same nodes, what’s the right separation?

● Out of the box framework support is lacking (no native Spring, Dropwizard support)

● Performance tuning requires knowledge of Titan AND Cassandra

● Play to Cassandra and adjacency list storage format strengths

● You can’t hide from tombstones!!!

Page 37: Titan and Cassandra at WellAware

Graph and Titan resources

● Tinkerpop docs - http://www.tinkerpop.com/docs/3.0.0.M6/● Titan docs - http://s3.thinkaurelius.com/docs/titan/0.9.0-M2/● Titan Google group - https://groups.google.com/forum/#!

forum/aureliusgraphs● Gremlin Google group - https://groups.google.com/forum/#!forum/gremlin-

users● O’Reilly graph ebook (focuses on Neo4j but has generally applicable graph

info) - http://graphdatabases.com/● Java OGM - https://github.com/BrynCooke/totorom● Python OGM - https://mogwai.readthedocs.org/en/latest/

Page 38: Titan and Cassandra at WellAware

Thanks and what questions do you have?