DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

42
Artem Chebotko Graph Data Modeling in DataStax Enterprise

Transcript of DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Page 1: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

A  r  t  e  m        C  h  e  b  o  t  k  o

Graph  Data  Modeling  in  DataStax Enterprise

Page 2: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

1 DataStax Enterprise  Graph

2 Property  Graph  Data  Model

3 Data  Modeling  Framework

4 Schema  Optimizations

2©  DataStax,  All  Rights  Reserved.

Page 3: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

DSE  Graph

• Real-­time  Graph  DBMS• Very  large  graphs• Many  concurrent  users• Proven  technologies  and  standards• OLTP  and  OLAP  capabilities

©  DataStax,  All  Rights  Reserved. 3

Page 4: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

DSE  Graph  Design

©  DataStax,  All  Rights  Reserved. 4

Graph  ApplicationsDSE  Graph

Page 5: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

DSE  Graph

Property  Graph  and  GremlinDSE  schema  API

DSE  Graph  Design

©  DataStax,  All  Rights  Reserved. 5

Graph  Applications

Page 6: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

DSE  Graph

Property  Graph  and  GremlinDSE  schema  API

DSE  Graph  Design

©  DataStax,  All  Rights  Reserved. 6

Fully  integratedbackend  technologies

Graph  Applications

Page 7: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Property  Graph  and  GremlinDSE  schema  API

DSE  Graph

DSE  Graph  Design

©  DataStax,  All  Rights  Reserved. 7

Schema,  data,  and  query  mappingsOLTP  and  OLAP  engines

Fully  integratedbackend  technologies

Graph  Applications

Page 8: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

DSE  Graph  Use  Cases

©  DataStax,  All  Rights  Reserved. 8

Customer  360

Internet  of  Things

Personalization

Recommendations

Fraud  detection

Page 9: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

1 DataStax Enterprise  Graph

2 Property  Graph  Data  Model

3 Data  Modeling  Framework

4 Schema  Optimizations

9©  DataStax,  All  Rights  Reserved.

Page 10: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Property  Graph  Data  Model

• Instance• Defined  in  Apache  TinkerPop™• Vertices,  edges,  and  properties

• Schema• Defined  in  DataStax Enterprise• Vertex  labels,  edge  labels,  and  property  keys

©  DataStax,  All  Rights  Reserved. 10

Page 11: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Vertices

©  DataStax,  All  Rights  Reserved. 11

movie

useruser

genremovie

person

Page 12: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Edges

©  DataStax,  All  Rights  Reserved. 12

movie

userrated rated

user

knows

genrebelongsTo belongsTo

actor

movie

person

Page 13: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Properties

©  DataStax,  All  Rights  Reserved. 13

movieId: m267title: Alice in Wonderlandyear: 2010duration: 108country: United States

rating: 6rating: 5genreId: g2name: Adventure

userId: u75age: 17gender: F

movieId: m16title: Alice in Wonderlandyear: 1951duration: 75country: United States

userId: u185age: 12gender: M

movie

userrated rated

user

knows

genrebelongsTo belongsTo

actor

movie

personId: p4361 name: Johnny Depp

person

Page 14: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Multi-­ and  Meta-­Properties

©  DataStax,  All  Rights  Reserved. 14

movieId: m267title: Alice in Wonderlandyear: 2010duration: 108country: United Statesproduction: [Tim Burton Animation Co., Walt Disney Productions]budget: [$150M, $200M]

m267movie

source: Bloomberg Businessweekdate: March 5, 2010

source: Los Angeles Timesdate: March 7, 2010

Page 15: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Graph  Schema

©  DataStax,  All  Rights  Reserved. 15

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

personId:textname :text

genreId :textname :text

userId :textage :intgender :text

rating :intgenrebelongsTomovieuser rated

person

cine

mat

ogra

pher

acto

r

dire

ctor

com

pose

rsc

reen

writ

er

knows

Page 16: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Importance  of  Graph  Schema

• DSE  needs  a  graph  schema  to  generate  a  C*  schema• Vertex  labels                        → tables• Property  keys                    →  columns• Graph  indexes                  → materialized  views

secondary  indexes              search  indexes

• Additional  data  validation  benefits

©  DataStax,  All  Rights  Reserved. 16

Page 17: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Schema  Mapping  Example

Property  TableCREATE TABLE user_p ( community_id int, member_id bigint, "~~property_key_id" int, "~~property_id" uuid, age int, gender text, "userId" text, "~~vertex_exists" boolean,

PRIMARY KEY (community_id, member_id, "~~property_key_id", "~~property_id"))

©  DataStax,  All  Rights  Reserved. 17

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

personId:textname :text

genreId :textname :text

userId :textage :intgender :text

rating :intgenrebelongsTomovieuser rated

person

cine

mat

ogra

pher

acto

r

dire

ctor

com

pose

rsc

reen

writ

er

knows

Page 18: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Schema  Mapping  Example

Property  TableCREATE TABLE user_p ( community_id int, member_id bigint, "~~property_key_id" int, "~~property_id" uuid, age int, gender text, "userId" text, "~~vertex_exists" boolean,

PRIMARY KEY (community_id, member_id, "~~property_key_id", "~~property_id"))

Adjacency  TableCREATE TABLE user_e ( community_id int, member_id bigint, "~~edge_label_id" int, "~~adjacent_vertex_id" blob, "~~adjacent_label_id" smallint, "~~edge_id" uuid, "~rating" int, "~~edge_exists" boolean, "~~simple_edge_id" uuid,

PRIMARY KEY (community_id, member_id, "~~edge_label_id", "~~adjacent_vertex_id", "~~adjacent_label_id", "~~edge_id"))

©  DataStax,  All  Rights  Reserved. 18

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

personId:textname :text

genreId :textname :text

userId :textage :intgender :text

rating :intgenrebelongsTomovieuser rated

person

cine

mat

ogra

pher

acto

r

dire

ctor

com

pose

rsc

reen

writ

er

knows

Page 19: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

1 DataStax Enterprise  Graph

2 Property  Graph  Data  Model

3 Data  Modeling  Framework

4 Schema  Optimizations

19©  DataStax,  All  Rights  Reserved.

Page 20: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Data  Modeling

• Process  of  organizing  and  structuring  data• Based  on  well-­defined  set  of  rules  or  methodology• Results  in  a  graph  or  database  schema• Affects  data  quality,  data  storage  and  data  retrieval

©  DataStax,  All  Rights  Reserved. 20

Page 21: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Traditional  Schema  Design

Data  Model• Conceptual  Data  Model  (CDM)

• Logical  Data  Model  (LDM)• Physical  Data  Model  (PDM)

Purpose• Understand  data  and  its  applications

• Sketch  a  graph  data  model• Optimize  physical  design

©  DataStax,  All  Rights  Reserved. 21

Page 22: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

knows

User

userIdage

gender

Movierated

rating

movieId

titleyear

duration

country

production

Genre

Person

belongsTo

involved

Actor Director Composer Screen-­‐writer

Cinema-­‐tographer

IsA

personId

name

genreId

name

Conceptual  Data  Model

• Entity  types• Relationship  types• Attribute  types

©  DataStax,  All  Rights  Reserved. 22

Page 23: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Transition  from  CDM  to  LDM  

• Both  CDM  and  LDM  are  graphs• Entity  types                              →    Vertex  labels• Relationship  types        →    Edge  labels• Attribute  types                      →    Property  keys

• Mostly  straightforward  with  a  few  nuances

©  DataStax,  All  Rights  Reserved. 23

Page 24: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

personId:textname :text

genreId :textname :text

userId :textage :intgender :text

rating :intgenrebelongsTomovieuser rated

person

cine

mat

ogra

pher

acto

r

dire

ctor

com

pose

rsc

reen

writ

er

knows

Logical  Data  Model

©  DataStax,  All  Rights  Reserved. 24

• Vertex  labels• Edge  labels• Property  keys

Page 25: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Keys

• Entity  type  keys  →  Property  keys• Uniqueness  is  not  enforced• Vertex  IDs  are  auto-­generated

• Entity  type  keys  →  Custom  vertex  IDs• Uniqueness  is  enforced• Overriding  default  partitioning• Advanced  feature

©  DataStax,  All  Rights  Reserved. 25

User

userId :textage :intgender :text

user

userIdage

gender

Page 26: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Symmetric  Relationships

©  DataStax,  All  Rights  Reserved. 26

User Movieratedmovieuser

ratedwasRatedBy

movieuserwasRatedBy

movieuser

rated

wasRatedBy

Page 27: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Bi-­Directional  Relationships

©  DataStax,  All  Rights  Reserved. 27

User

knows

user

knows

userknows

user

userknows

user

user knows user

knows

Page 28: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Qualified  Bi-­Directional  Relationships

©  DataStax,  All  Rights  Reserved. 28

strength :int

User

likes

user

likes

userlikes

user

userlikes

user

userlikes

user

likesstrength

strength: 7

strength: 9

strength: 7

strength: 9

Page 29: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Hierarchies

©  DataStax,  All  Rights  Reserved. 29

Movie

involved

Person

IsA

Actor Director

movie

person

directoractor

involved

isA isA

movie

person

involved

role:text

movie

person

actor

director

movie

person

directoractor

isA isA

involved

involved

Page 30: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Physical  Data  Modelschema.propertyKey("userId").Text().create()

schema.propertyKey("name").Text().create()

schema.propertyKey("age").Int().create()

schema.vertexLabel("user").properties("userId","age",…).create()

schema.vertexLabel("movie").properties("movieId",…).create()

schema.edgeLabel("knows").connection("user","user").create()

schema.edgeLabel("rated").single().properties("rating")

.connection("user","movie").create()

©  DataStax,  All  Rights  Reserved. 30

Page 31: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

1 DataStax Enterprise  Graph

2 Property  Graph  Data  Model

3 Data  Modeling  Framework

4 Schema  Optimizations

31©  DataStax,  All  Rights  Reserved.

Page 32: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Optimizing  PDM  for  Performance

• Indexing  data• Controlling  partitioning• Materializing  aggregates  and  inferences• Rewriting  traversals

©  DataStax,  All  Rights  Reserved. 32

Page 33: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Vertex  Indexesschema.vertexLabel("movie")

.index("moviesById")

.materialized()

.by("movieId")

.add()

g.V().has("movie","movieId","m267")

©  DataStax,  All  Rights  Reserved. 33

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*

movie

Page 34: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Property  Indexesschema.vertexLabel("movie")

.index("movieBudgetBySource")

.property("budget")

.by("source")

.add()

g.V().has("movie","movieId","m267")

.properties("budget")

.has("source","Los Angeles Times").value()

©  DataStax,  All  Rights  Reserved. 34

movieId: m267title: Alice in Wonderlandyear: 2010duration: 108country: United Statesproduction: [Tim Burton Animation Co., Walt Disney Productions]budget: [$150M, $200M]

movie

source: Bloomberg Businessweekdate: March 5, 2010

source: Los Angeles Timesdate: March 7, 2010

Page 35: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Edge  Indexesschema.vertexLabel("user")

.index("toMoviesByRating")

.outE("rated")

.by("rating")

.add()

g.V().has("user","userId","u1")

.outE("rated").has("rating",gt(6)).inV()

©  DataStax,  All  Rights  Reserved. 35

rating: 7movieuser rated

rating: 9

movierated

rating: 7movie

rated

Page 36: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

movie_p

year Kcountry KmovieId C↑~~property_key_id C↑~~property_id C↑durationtitle~~vertex_exists

Custom  Partitioningschema.vertexLabel("movie")

.partitionKey("year","country")

.clusteringKey("movieId")

.properties("title","duration")

.create()

©  DataStax,  All  Rights  Reserved. 36

movie_e

year Kcountry KmovieId C↑~~edge_label_id C↑~~adjacent_vertex_id C↑~~adjacent_label_id C↑~~edge_id C↑~~edge_exists~~simple_edge_id

Page 37: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

movieId :texttitle :textyear :intduration :intcountry :textproduction :text*avg :float

movie

Materializing  Aggregatesg.V().hasLabel("movie")

.property("avg",_.inE("rated")

.values("rating")

.mean())

©  DataStax,  All  Rights  Reserved. 37

Page 38: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Materializing  Inferencesg.V().has("person","name","Tom Hanks").as("tom")

.in("actor").out("actor").where(neq("tom")).dedup()

.addE("knows").from("tom")

©  DataStax,  All  Rights  Reserved. 38

movietomperson

actor

person

person

actor

actor

knows

knows

Page 39: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Rewriting  Traversals

• Equivalent  results• Different  execution  plans• Different  response  times

©  DataStax,  All  Rights  Reserved. 39

g.V().has("movie","year",2010).out("actor").has("name","Johnny Depp").count()

g.V().has("person","name","Johnny Depp").in("actor").has("year",2010).count()

Page 40: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Profiling  Traversals

©  DataStax,  All  Rights  Reserved. 40

Page 41: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

Thank  You

©  DataStax,  All  Rights  Reserved. 41

Artem [email protected]/in/artemchebotko

Page 42: DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

The  End