Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

32
A Cassandra Data Model for Serving up Cat Videos Luke Tillman (@LukeTillman) Language Evangelist at DataStax

description

Keyboard Cat, Nyan Cat, and of course the world famous Grumpy Cat--it seems like the Internet can’t get enough cat videos. If you were building an application to let users share and consume their fill of videos, how would you go about it? In this talk, we’ll take a look at the data model for KillrVideo, a sample video sharing application similar to YouTube where users can share videos, comment, rate them, and more. You’ll learn get a practical introduction to Cassandra data modeling, querying with CQL, how the application drives the data model, and how to shift your thinking from the relational world you probably have experience with.

Transcript of Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Page 1: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

A Cassandra Data Model for Serving up Cat Videos Luke Tillman (@LukeTillman) Language Evangelist at DataStax

Page 2: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Who are you?! •  Evangelist with a focus on the .NET Community •  Long-time Developer •  Recently presented at Cassandra Summit 2014 with Microsoft

•  Very Recent Denver Transplant 2

Page 3: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

1 What is this KillrVideo thing you speak of?

2 Know Thy Data and Thy Queries

3 Denormalize All the Things

4 Building Healthy Relationships

5 Knowing Your Limits

3

Page 4: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

What is this KillrVideo thing you speak of?

4

Page 5: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

KillrVideo, a Video Sharing Site

•  Think a YouTube competitor –  Users add videos, rate them, comment on them, etc. –  Can search for videos by tag

Page 6: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

See the Live Demo, Get the Code

•  Live demo available at http://www.killrvideo.com –  Written in C# –  Live Demo running in Azure –  Open source: https://github.com/luketillman/killrvideo-csharp

•  Interesting use case because of different data modeling challenges and the scale of something like YouTube

–  More than 1 billion unique users visit YouTube each month –  100 hours of video are uploaded to YouTube every minute

6

Page 7: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Just How Popular are Cats on the Internet?

7 http://mashable.com/2013/07/08/cats-bacon-rule-internet/

Page 8: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Just How Popular are Cats on the Internet?

8 http://mashable.com/2013/07/08/cats-bacon-rule-internet/

Page 9: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Know Thy Data and Thy Queries

Page 10: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Getting to Know Your Data

•  What things do I have in the system?

•  What are the relationships between them?

•  This is your conceptual data model

•  You already do this in the RDBMS world

Page 11: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Some of the Entities and Relationships in KillrVideo

11

User

id

firstname

lastname

email

password Video

id

name

description

location

preview_image

tags features

Comment comment

id

adds

timestamp

posts

timestamp

1

n n

1

1

n n

m rates

rating

Page 12: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Getting to Know Your Queries

•  What are your application’s workflows?

•  How will I access the data?

•  Knowing your queries in advance is NOT optional

•  Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries

12

Page 13: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Some Application Workflows in KillrVideo

13

User Logs into site

Show basic information about user

Show videos added by a

user

Show comments

posted by a user

Search for a video by tag

Show latest videos added

to the site

Show comments for a video

Show ratings for a video

Show video and its details

Page 14: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Some Queries in KillrVideo to Support Workflows

14

Users

User Logs into site

Find user by email address

Show basic information about user

Find user by id

Comments

Show comments for a video

Find comments by video (latest first)

Show comments

posted by a user

Find comments by user (latest first)

Ratings

Show ratings for a video Find ratings by video

Page 15: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Some Queries in KillrVideo to Support Workflows

15

Videos

Search for a video by tag Find video by tag

Show latest videos added

to the site

Find videos by date (latest first)

Show video and its details

Find video by id Show videos added by a

user

Find videos by user (latest first)

Page 16: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Denormalize All the Things

Page 17: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Breaking the Relational Mindset

•  Disk is cheap now

•  Writes in Cassandra are FAST

•  Many times we end up with a “table per query”

•  Take advantage of atomic batches to write to multiple tables

•  Similar to materialized views from the RDBMS world

17

Page 18: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Users – The Relational Way

•  Single Users table with all user data and an Id Primary Key

•  Add an index on email address to allow queries by email

User Logs into site

Find user by email address

Show basic information about user

Find user by id

Page 19: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Users – The Cassandra Way

User Logs into site

Find user by email address

Show basic information about user

Find user by id

CREATE  TABLE  user_credentials  (          email  text,          password  text,          userid  uuid,          PRIMARY  KEY  (email)  );  

CREATE  TABLE  users  (          userid  uuid,          firstname  text,          lastname  text,          email  text,          created_date  timestamp,          PRIMARY  KEY  (userid)  );  

Page 20: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Videos Everywhere!

20

Show video and its details

Find video by id Show videos added by a

user

Find videos by user (latest first)

CREATE  TABLE  videos  (          videoid  uuid,          userid  uuid,          name  text,          description  text,          location  text,          location_type  int,          preview_image_location  text,          tags  set<text>,          added_date  timestamp,          PRIMARY  KEY  (videoid)  );  

CREATE  TABLE  user_videos  (          userid  uuid,          added_date  timestamp,          videoid  uuid,          name  text,          preview_image_location  text,          PRIMARY  KEY  (userid,                    added_date,  videoid)  )    WITH  CLUSTERING  ORDER  BY  (          added_date  DESC,            videoid  ASC);  

Page 21: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Videos Everywhere!

Considerations When Duplicating Data •  Can the data change? •  How likely is it to change or how frequently will it change? •  Do I have all the information I need to update duplicates and

maintain consistency?

21

Search for a video by tag Find video by tag

Show latest videos added

to the site

Find videos by date (latest first)

Page 22: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Building Healthy Relationships

Page 23: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Modeling Relationships – Collection Types

•  Cassandra doesn’t support JOINs, but your data will still have relationships (and you can still model that in Cassandra)

•  One tool available is CQL collection types CREATE  TABLE  videos  (          videoid  uuid,          userid  uuid,          name  text,          description  text,          location  text,          location_type  int,          preview_image_location  text,          tags  set<text>,          added_date  timestamp,          PRIMARY  KEY  (videoid)  );  

Page 24: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Modeling Relationships – Client Side Joins

24

CREATE  TABLE  videos  (          videoid  uuid,          userid  uuid,          name  text,          description  text,          location  text,          location_type  int,          preview_image_location  text,          tags  set<text>,          added_date  timestamp,          PRIMARY  KEY  (videoid)  );  

CREATE  TABLE  users  (          userid  uuid,          firstname  text,          lastname  text,          email  text,          created_date  timestamp,          PRIMARY  KEY  (userid)  );  

Currently requires query for video, followed by query for user by id based on results of first query

Page 25: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Modeling Relationships – Client Side Joins

•  What is the cost? Might be OK in small situations

•  Do NOT scale

•  Avoid when possible

25

Page 26: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Modeling Relationships – Client Side Joins

26

CREATE  TABLE  videos  (          videoid  uuid,          userid  uuid,          name  text,          description  text,          ...          user_firstname  text,          user_lastname  text,          user_email  text,          PRIMARY  KEY  (videoid)  );  

CREATE  TABLE  users_by_video  (          videoid  uuid,          userid  uuid,          firstname  text,          lastname  text,          email  text,          PRIMARY  KEY  (videoid)  );  

or

Page 27: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Modeling Relationships – Client Side Joins

•  Remember the considerations when you duplicate data

•  What happens if a user changes their name or email address?

•  Can I update the duplicated data?

27

Page 28: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Knowing Your Limits

Page 29: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Cassandra Rules Can Impact Your Design

•  Video Ratings – use counters to track sum of all ratings and count of ratings

•  Counters are a good example of something with special rules

CREATE  TABLE  videos  (          videoid  uuid,          userid  uuid,          name  text,          description  text,          ...          rating_counter  counter,          rating_total  counter,          PRIMARY  KEY  (videoid)  );  

CREATE  TABLE  video_ratings  (          videoid  uuid,          rating_counter  counter,          rating_total  counter,          PRIMARY  KEY  (videoid)  );  

Page 30: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Single Nodes Have Limits Too

•  Latest videos are bucketed by day

•  Means all reads/writes to latest videos are going to same partition (and thus the same nodes)

•  Could create a hotspot

30

Show latest videos added

to the site

Find videos by date (latest first)

CREATE  TABLE  latest_videos  (          yyyymmdd  text,          added_date  timestamp,          videoid  uuid,          name  text,          preview_image_location  text,          PRIMARY  KEY  (yyyymmdd,                  added_date,  videoid)  )  WITH  CLUSTERING  ORDER  BY  (          added_date  DESC,            videoid  ASC  );  

Page 31: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Single Nodes Have Limits Too

•  Mitigate by adding data to the Partition Key to spread load

•  Data that’s already naturally a part of the domain

–  Latest videos by category?

•  Arbitrary data, like a bucket number

–  Round robin at the app level

31

Show latest videos added

to the site

Find videos by date (latest first)

CREATE  TABLE  latest_videos  (          yyyymmdd  text,          bucket_number  int,          added_date  timestamp,          videoid  uuid,          name  text,          preview_image_location  text,          PRIMARY  KEY  (                  (yyyymmdd,  bucket_number)                  added_date,  videoid)  )  ...  

Page 32: Cassandra Day Denver 2014: A Cassandra Data Model for Serving up Cat Videos

Questions?

32

Follow me on Twitter for updates or to ask questions later: @LukeTillman