Download - Data stax webinar cassandra and titandb insights into datastax graph strategy v3 wc

Cassandra and TitanDBInsights into DataStax's Graph Strategy

Robin Schumacher – VP Products

Dr. Matthias Broecheler – Director of Engineering

Agenda

• Overview of DataStax

• Introduction to Graph

• Comparing Graph to an RDBMS

• A Look at DataStax’s Graph Strategy

• Next Steps

©2015 DataStax

Founded in April 2010

450+

Santa Clara, Austin, New York, London,

Paris, Tokyo, Sydney

410+Employees Customers

30Percent

Overview

1970s 1990s

Client-ServerMainframe

Evolution of Data Management

4

Monolithic hardware

Centralized workloads

Vendor lock-in

General purpose databases (one size fits all)

Isolated / semi-connected

Commodity hardware

Distributed workloads

Massive scalability

Radically connected

Today

Cloud Mobile Social

Infrastructure centric Application / data centric

Cassandra – NoSQL for Modern Enterprise Workloads

Always on

Fully distributed

Best in scale and performance

80%+ contributions -> DataStax

Free tools and drivers

Free training

©2015 DataStax

San

Francisco

Stockholm

New York

Enabling The Internet Enterprise with DataStax Enterprise

©2015 DataStax

Introduction to Graph

What is a Graph Database?

©2015 DataStax

High Level Used to manage highly connected or complex data

User Level Used to support traversal and analytic queries against a data

model that uses vertices, edges and properties to represent

and store data

Technical Level Uses specialized index structures, data partitioning

techniques, and query optimizers to efficiently traverse large

graphs


©2015 DataStax

DataStaxDataBricks

Spark

DSE

CassandraJonathan Ellis

Robin Schumacher

Billy Bosworth

worksFor

title: VP Product

develops

uses

uses

reportsTo

worksFor

title: CTO

worksFor

title: CEO


©2015 DataStax

DataStaxDataBricks

Spark

DSE

CassandraJonathan Ellis

Robin Schumacher

Billy Bosworth

worksFor

title: VP Product

develops

uses

uses

reportsTo

worksFor

title: CTO

worksFor

title: CEO

Property

Edge

Vertex

A Graph Database Helps Answer Queries Like…

…should an initiated transaction be considered fraudulent or malicious based

on past user actions or normal patterns of system behavior?

…what products or actions should we recommend to a user based on their

preferences and behavioral patterns to maximize sales or user engagement?

…what campaigns should be run for different segments of a company’s

customer base?

©2015 DataStax

Comparing Graph DB to RDBMS

Key Difference Between Graph DB and RDBMS

©2015 DataStax

RDBMS Graph DB

Process to query data elements

(joins) is inefficient on large data

sets or many relationships

Better performance for relationship

queries due to specialized index

structures

Expressing JOIN-intensive queries

in SQL is time-consuming and error-

prone

Intuitive query language enabling

faster application development

RDBMS vs. Graph DB: Query Complexity

©2015 DataStax

SELECT TOP (5) [t14].[ProductName]

FROM (SELECT COUNT(*) AS [value],

[t13].[ProductName]

FROM [customers] AS [t0]

CROSS APPLY (SELECT [t9].[ProductName]

FROM [orders] AS [t1]

CROSS JOIN [order details] AS [t2]

INNER JOIN [products] AS [t3]

ON [t3].[ProductID] = [t2].[ProductID]


INNER JOIN [orders] AS [t5]

ON [t5].[OrderID] = [t4].[OrderID]

LEFT JOIN [customers] AS [t6]

ON [t6].[CustomerID] = [t5].[CustomerID]

CROSS JOIN ([orders] AS [t7]



ON [t9].[ProductID] = [t8].[ProductID])

WHERE NOT EXISTS(SELECT NULL AS [EMPTY]





WHERE [t9].[ProductID] = [t12].[ProductID]

AND [t10].[CustomerID] = [t0].[CustomerID]

AND [t11].[OrderID] = [t10].[OrderID])

AND [t6].[CustomerID] <> [t0].[CustomerID]


AND [t2].[OrderID] = [t1].[OrderID]

AND [t4].[ProductID] = [t3].[ProductID]


AND [t8].[OrderID] = [t7].[OrderID]) AS [t13]

WHERE [t0].[CustomerID] = N'ALFKI'

GROUP BY [t13].[ProductName]) AS [t14]

ORDER BY [t14].[value] DESC

g.V('customerId','ALFKI').as('customer')

.out('ordered').out('contains').out('is').as('products')

.in('is').in('contains').in('ordered').except('customer')

.out('ordered').out('contains').out('is').except('products')

.groupCount().cap().orderMap(T.decr)[0..<5].productNa

me

VS.

RDBMS vs. Graph DB: Data Modeling

©2015 DataStax

SELECT TOP (5) [t14].[ProductName]

FROM (SELECT COUNT(*) AS [value],

[t13].[ProductName]

FROM [customers] AS [t0]

CROSS APPLY (SELECT [t9].[ProductName]






INNER JOIN [orders] AS [t5]

ON [t5].[OrderID] = [t4].[OrderID]

LEFT JOIN [customers] AS [t6]

ON [t6].[CustomerID] = [t5].[CustomerID]

CROSS JOIN ([orders] AS [t7]



ON [t9].[ProductID] = [t8].[ProductID])

WHERE NOT EXISTS(SELECT NULL AS [EMPTY]





WHERE [t9].[ProductID] = [t12].[ProductID]


AND [t11].[OrderID] = [t10].[OrderID])

AND [t6].[CustomerID] <> [t0].[CustomerID]


AND [t2].[OrderID] = [t1].[OrderID]

AND [t4].[ProductID] = [t3].[ProductID]


AND [t8].[OrderID] = [t7].[OrderID]) AS [t13]

WHERE [t0].[CustomerID] = N'ALFKI'

GROUP BY [t13].[ProductName]) AS [t14]

ORDER BY [t14].[value] DESC

VS.

Comparing Graph DB to NoSQL

Key Difference Between Graph DB and NoSQL

©2015 DataStax

NoSQL Graph DB

Data model can’t represent relationships

between rows or documents requiring

application developers to maintain those

inside the application which is

cumbersome, inefficient, and error prone

Natively supports

relationships in the data

model and provides a query

language to efficiently

retrieve them

NoSQL vs. Graph DB: Query Expressivity

©2015 DataStax

g.V('customerId','ALFKI').as('customer')

.out('ordered').out('contains').out('is').as('products')

.in('is').in('contains').in('ordered').except('customer')

.out('ordered').out('contains').out('is').except('products')

.groupCount().cap().orderMap(T.decr)[0..<5].productNam

e

VS.?(requires application code)

A Look at DataStax’s Graph Strategy

Product Strategy for 2015

© 2015 DataStax, All Rights Reserved. 20

• Part of DataStax’s product strategy in 2015 will be to support multiple

data models in DataStax Enterprise (DSE)

• Support for multi-model will occur across several releases of DSE in

2015

Why Multi-Model in DataStax Enterprise?

21

Transactions Analytics Search

Mixed Workload Needed?

Solved in DSE

Wide Row Graph JSON

Mixed Model Needed?

Solved in DSE

DSE

Analytics

Search

Transactions

DSEWide Row

JSON

Graph

Why Graph?

• Best answer for applications having highly connected data

• Key enabler of systems of engagement and systems of insight applications

• Use cases include:

• Personalization

• Social engagement systems (e.g. matchmaking services, contacts

catalogs, etc.)

• Fraud detection

• Financial analysis

• Security analysis

• Communication

• Supply chain management

©2015 DataStax

Titan – the Foundation for DSE Graph

• Titan is a scalable, distributed graph database that is optimized for storing,

traversing and querying complex graph data in real time

• Titan is open source and licensed under the Apache 2

• Current technical benefits include:

• Built on top of Cassandra, Hbase, and BerkeleyDB

• Scale-out and multi-data center capable

• Able to support thousands of concurrent users and billions of graph data points

• Analytics on graph data supported via Hadoop integration

• Search enabled via support for Solr, Lucene, and Elasticsearch

©2015 DataStax

What is DataStax Enterprise Graph?

DSE Graph is a scalable graph database solution for modern Web and mobile

applications that need to manage highly connected data

DSE Graph will be deeply integrated into

the DSE platform:

• Tight Cassandra integration

• Graph analytics powered by Spark

• DSE Search support

• OpsCenter monitoring

©2015 DataStax

2015 Plans for Titan / DSE Graph

• DataStax will contribute to TinkerPop and is dedicated to making it the #1

open source graph framework

• Release Titan 1.0 (TP3 compatible; a prerequisite coming out 1-2 months

before)

• First release of DSE Graph to occur in DSE 5.0. EAP builds will be

available for interested customer

• Recommendations for customer are to continue to develop using

TinkerPop to ensure seamless compatibility with DSE Graph

• DataStax to provide utilities/instructions for moving existing Titan

databases to DSE Graph

©2015 DataStax

Next Steps

• Check DataStax blog for updates on DSE Graph

• If a current DSE customer, contact us about participating in upcoming

Early Adopter Program (EAP) releases of DSE Graph

• If haven’t tried DSE yet, download it from

http://www.datastax.com/download and follow our getting started guide in

your own environment (or use the DataStax Sandbox)

©2015 DataStax

Thank you!

Questions?