Titan and Cassandra at WellAware
-
Upload
twilmes -
Category
Data & Analytics
-
view
78 -
download
2
Transcript of Titan and Cassandra at WellAware
Titan and Cassandra at WellAware
Topics
● The property graph model
● The graph ecosystem
● Titan overview
● Titan at WellAware
Property graph model
label: trucklicense: ABC123year: 2013
label: personfirstName: Susan
label: companyname: Acme Trucks
ownsbought: 2012
employshired: 2012
drives
Sampling of graph projects/vendors
Apache Tinkerpop - tying it all together!
● Gremlin Server○ Remote access to Tinkerpop
compliant graph dbs for JVM & non-JVM clients
● Gremlin○ Graph query and processing
language● Core API
○ add vertex○ add edge○ add/update properties○ simple queries (adjacent
edge/vertex retrieval)http://tinkerpop.incubator.apache.org/
Gremlin in actionvehiclelicense: ABC123year: 2013
personfirstName: Susan
companyname: Acme Trucks
personfirstName: Tom
employshired: 2014
employshired: 2012
ownsbought: 2012
● Add vertices and edges
● Retrieving vertices● Basic vertex filtering● Querying adjacent
edges and verticesdrives
Building the graph
// Add vertices
graph = TitanFactory.open('conf/titan-cassandra.properties')
acmeTrucks = graph.addVertex(T.label, "company", "name", "Acme Trucks")
susan = graph.addVertex(T.label, "person", "firstName", "Susan")
tom = graph.addVertex(T.label, "person", "firstName", "Tom")
truck = graph.addVertex(T.label, "vehicle", "license", "ABC123", "year", 2012)
// Connect vertices with edges
edge = acmeTrucks.addEdge("owns", truck)
edge.property("bought", 2012)
acmeTrucks.addEdge("employs", susan).property("hired", 2012)
acmeTrucks.addEdge("employs", tom).property("hired", 2014)
tom.addEdge("drives", truck)
Retrieving vertices
// Get a traverser so that we can run some queries
g = graph.traversal(standard())
gremlin> g.V()
==>v[0]
==>v[2]
==>v[4]
==>v[6]
// Get the properties for each vertex
gremlin> g.V().valueMap()
==>[name:[Acme Trucks]]
==>[firstName:[Susan]]
==>[firstName:[Tom]]
==>[license:[ABC123], year:[2012]]
Basic vertex filtering
// Retrieve all people with firstName Susan
gremlin> g.V().hasLabel("person").has("firstName", "Susan")
==>v[2]
// Retrieve all people with firstName Susan or Tom
gremlin> g.V().hasLabel("person").has("firstName", within("Susan", "Tom"))
==>v[2]
==>v[4]
Querying adjacent edges and vertices
// Count how many people Acme Trucks employs
gremlin> g.V().hasLabel("company").has("name", "Acme Trucks").out("employs").count()
==>2
// How many employees were hired in 2012?
gremlin> g.V().hasLabel("person").where(inE("employs").has("hired", 2012)).count()
==>1
// Which employees drives a truck?
gremlin> g.V().hasLabel("company").has("name", "Acme Trucks").out("employs").as("driver").out("drives").select("driver").values("firstName")
==>Tom
// Show me all of the drivers that were hired before 2015
gremlin> g.V().hasLabel("person").and(inE("employs").values("hired").is(lt(2015)), out("drives")).values("firstName")
==>Tom
Many more steps...
● AddEdge Step● AddVertex Step● AddProperty Step● Aggregate Step● And Step● As Step● By Step● Cap Step● Coalesce Step● Count Step● Choose Step● Coin Step
● CyclicPath Step● Dedup Step● Drop Step● Fold Step● Group Step● GroupCount Step● Has Step● Inject Step● Is Step● Limit Step● Local Step● Match Step● ...
GraphComputer for global graph processing
● Use cases○ full graph traversal○ parallel processing○ batch import/export
● Examples○ PageRank○ vertex count○ mass schema update
● Gremlin OLAP implementations○ Hadoop○ Spark○ Giraph
Graph use cases
● Social network analysis
● Fraud detection
● Recommendation systems
● Route optimization
● IoT
● Master data management
TitanDB
● What is Titan?
● Data store options
● Deployment options
● Titan Cassandra data model
● Titan specific graph features
TitanDB
● Graph layer that can use a variety of data stores as backends depending on user requirements○ HBase○ Berkeley DB○ Cassandra○ Insert your favorite k/v, BigTable data store
Which data store is right for you?
● Things to think about○ data volume ○ CAP○ ACID○ read/write requirements○ ops implications○ your current infrastructure
http://s3.thinkaurelius.com/docs/titan/0.5.4/benefits.html
Socket
JVM
Node
JVM
Node
Embedded
JVM
A Titan cluster with access options
TitanC*
TitanC*
TitanC*
TitanC*
TitanC*
● Access options○ Titan < 0.9
■ Rexster■ dependency of your app
○ Titan 0.9+■ Gremlin server■ dependency of your app
○ Object to graph mapper■ Python - Mogwai, Bulbs■ JVM - Totorom, Frames
● Titan does not need to be on each node, all communication between Titan instances is through C*
Titan installation
● Download and unzip latest milestone● Cassandra footprint
○ Titan keyspace○ Column families
■ edgestore■ edgestore_lock_■ graphindex■ graphindex_lock_■ titan_ids■ ...
./bin/titan.sh start
Forking Cassandra...
Running `nodetool statusthrift`.. OK (returned exit status 0 and printed string "running").
Forking Elasticsearch...
Connecting to Elasticsearch (127.0.0.1:9300). OK (connected to 127.0.0.1:9300).
Forking Gremlin-Server...
Connecting to Gremlin-Server (127.0.0.1:8182)...... OK (connected to 127.0.0.1:8182).
Run gremlin.sh to connect.
Vertex and edge storage format
Cassandra Thrift
Titan storage format
Edge and property serialization
Schema definition
● Properties○ data type - string, float, char, geoshape, etc.○ cardinality - single, list, set○ uniqueness (through Titan’s indexing system)
● Edges○ labels○ define multiplicity - one-to-one, many-to-one, one-to-many
● Vertices○ labels
● Advanced○ edge, vertex, and property TTL○ Multi-properties - properties on properties (audit info for example)
Global indexing options
● Supports composite keys● Titan indexing provider
○ fast!○ exact matches only
● External providers○ Not as fast○ Many options beyond exact
matching (wildcards, geosearch, etc.)
○ providers■ Elastic Search■ Lucene■ Solr
I want that one!
Vertex Centric Indices
● Adjacent edge counts can grow quite large in certain situations and form super nodes
● Supports composite keys and ordering of edges to speed up vertex centric queries○ translates into slice queries of
the edges○ efficiently retrieve ranges of
edges or satisfy top n type queries
companyname: Acme Trucks
employshired: 2013
employshired: 2014
employshired: 2015
Graph partitioning with ByteOrderedPartioner
?
Vertex cuts
Supernode ...
1
2,000,000
mgmt = graph.openManagement()mgmt.makeVertexLabel('user').partition().make()mgmt.commit()
}
}
Edge cuts
Company A
Company B
@
A bit more about WellAware
● Founded in 2012● Full stack oil & gas monitoring solution● iOS, Android, and web clients● Connecting to field assets over RPMA, cellular, and
satellite
Functionality and high level architecture● Remote data collection● Mobile data collection● Asset control● Derived measurements● Alarming● Reporting
Poller DjangoTitan
WAN ESB
Moving to Titan
● 2013○ Running Django against PostgreSQL and for awhile, TempoDB
● Beginning of 2014 - started using Titan 0.4.4 to capture relationships between assets and for derived measurements
● March 2014 - deployed a 3 node Cassandra cluster and moved the rest of the backend (minus auth) over to Titan 0.4.4
● Today - 3 node DC for OLTP & 2 node reporting DC○ still on Titan 0.4.4, waiting for Titan 1.0 to be released and hardened○ post Titan 1.0, we’re looking forward to trying out DSE Graph
A common well pad configuration
Well & pumpjack
Tanks
Sample of model
O&G Co.
TankSite
Top Gauge
Zooming in on a well pad
wellmeter separator
meter
tank
tank
compressor
Lessons learned
● No native integration with 3rd party BI tools - reports, dashboards, ad hoc query○ Apache Calcite based jdbc driver that translates SQL to graph queries
● Colocation of Titan, some of your application code, and Cassandra on the same nodes, what’s the right separation?
● Out of the box framework support is lacking (no native Spring, Dropwizard support)
● Performance tuning requires knowledge of Titan AND Cassandra
● Play to Cassandra and adjacency list storage format strengths
● You can’t hide from tombstones!!!
Graph and Titan resources
● Tinkerpop docs - http://www.tinkerpop.com/docs/3.0.0.M6/● Titan docs - http://s3.thinkaurelius.com/docs/titan/0.9.0-M2/● Titan Google group - https://groups.google.com/forum/#!
forum/aureliusgraphs● Gremlin Google group - https://groups.google.com/forum/#!forum/gremlin-
users● O’Reilly graph ebook (focuses on Neo4j but has generally applicable graph
info) - http://graphdatabases.com/● Java OGM - https://github.com/BrynCooke/totorom● Python OGM - https://mogwai.readthedocs.org/en/latest/
Thanks and what questions do you have?