Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

28
CASSANDRA @ ULTRAVISUAL Cassandra Day New York 2014 Skye Book Lead Systems Architect

description

Cassandra has been an integral part of Ultravisual’s infrastructure since its launch, allowing us to rapidly prototype and build new features that further enhance user experience. Over the course of this discussion we will cover three key topics. How Cassandra came to be used at Ultravisual and the key problem it solved. How the usage of Cassandra as part of our stack has evolved alongside the product. And finally, some of the experiences we’ve had with deploying and running Cassandra in a production environment.

Transcript of Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Page 1: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

CASSANDRA @

ULTRAVISUALCassandra Day New York 2014

Skye BookLead Systems Architect

Page 2: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

ULTRAVISUAL

A visual network for inspiration, expression,

and collaboration

Page 3: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

The Feed• A user’s first taste of UV

• More than just posts

• Constantly being tweaked and re-thought

Page 4: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

SELECT DISTINCT _post.*FROM _postJOIN _collection_post cp ON _post.uuid=cp.post_uuidJOIN _collection_follow cf ON cp.c_uuid=cf.collection_uuidWHERE cf.user_id = ?ORDER BY _post.created_at DESCLIMIT 20 OFFSET 0

The Old Way

Started Simple !

“Show me recent posts in collections I follow”

Page 5: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

SELECT a.*FROM _user_follow a, _user_follow bWHERE b.follower=12345AND a.follower=b.followedORDER BY a.followed_at DESCLIMIT 20 OFFSET 0

The Old Way

Added Complexity !

“Show me people recently followed by my connections”

Page 6: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

The Old Way

Every new feature needs another query !

Feed requests generate a disproportionate amount of load to normal CRUD ops

Page 7: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Reframing the Problem

From This:

A place for posts, new collections, social activity, and anything else interesting

nitro404.com/computers/knex.php

Page 8: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Reframing the Problem

To This:

A list of items interesting to the user

Page 9: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

The New Way

Model First

• With an SQL background, this can be misleading.

• Essential Question: “How do I need to access this data?”

Page 10: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

–Rick Branson, Instagram Cassandra Summit 2013

“Try to model data as a log of user intent”

The New Way

Page 11: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

}The New Way

user status

created_at

story json2 0 61b97280 user_follow:3:5 {“foo”:”bar”}

2 1 5daa04c0 post:bfbd0a39 {“foo”:”bar”}

2 1 565752e0 collection_follow:5:d70961c1

{“foo”:”bar”}

2 1 4a8189e0 user_follow:3:5 {“foo”:”bar”}

Primary Key Cached story JSON

Model for user feeds

• Fast to fetch user stories

• Cached JSON means almost zero SQL requests

Page 12: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Fast.Response times cut from

over 100’s ms to 30ms range

Page 13: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Launch WeekFeatured by Apple!

Cluster Disk Usage

26%

74%

Page 14: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Don’t be too cute

cqlsh:ultravisual> ALTER TABLE latest_feed DROP json;

Page 15: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Handling Deletions• Data is only appended,

never deleted from user feeds

• Adapted Instagram’s ‘Anti-Column’ solution

• Avoids missed deletions for nodes down longer than GCGraceSeconds

• Avoids race condition where deletion arrives before write.

Sam follows Sandy

user

created_at

status

story2 4a8189e0 1 user_follow:

3:5Sam unfollows Sandy

user

created_at

status

story2 61b97280 0 user_follow:

3:52 4a8189e0 1 user_follow:

3:5

Page 16: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Negated Entriesuser

created_at

status

story2 61b97280 0 user_follow:

3:52 4a8189e0 1 user_follow:

3:5

user

status

created_at

story2 0 61b97280 user_follow:

3:52 1 4a8189e0 user_follow:

3:5

Keeps all entries in a single time series

First page can usually be populated by a single read

Splits user’s row into two lists, live and undo

Will always require at least two reads

Page 17: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Further Uses• User Notifications

• User Onboarding

• Reshare Statistics

• User & Content Reports

• API Statistics

Page 18: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

User Onboarding

user created_at

sequence step content2 61b97280 onboaring_v2 1 rec_collections_1

3 5daa04c0 onboaring_v2 2 rec_collections_2

5 565752e0 onboaring_v3 1 find_friends

6 4a8189e0 onboaring_v3 1 find_friends

Sequenced feed entries for users on signup

Page 19: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Production Experiences

Drivers • Java: Started with Astyanax, moved to Datastax

v2

• Node.js: node-cassandra-cql

Page 20: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Cryptic message with large batch updates in pre-release versions of 2.0 driver

DS Driver Issue 229

com.datastax.driver.core.exceptions.DriverInternalError: An unexpected protocol error occured. This is a bug in this library, please report: Unknown code 256 for a consistency level

As of 2.0, batches with more than 64k statements throw a better exception:

java.lang.IllagalStateException: Batch statement cannot contain more than 65536 statements.

Page 21: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Just use LZ4

Compression

Page 22: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Cassandra-4851Unfortunate truth in Cassandra 2.0.5

!cqlsh:test> SELECT * FROM user_feed WHERE user = 2 AND created_at > :some_uuid AND status=0;!cqlsh:test> Bad Request: PRIMARY KEY part status cannot be restricted (preceding part created_at is either not restricted or by a non-EQ relation)

Page 23: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Cassandra-4851

Adds CQL3 support for vector comparison syntax

!cqlsh:test> SELECT * FROM timeline WHERE day = ’21 Jun 2014’ AND (hour,min) >= (3,50) AND (hour,min,sec) <= (4,37,30);

Available in 2.0.6

Page 24: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Production ExperiencesUpgrades • Manual package installs (dsc20 from Datastax)

• One node at a time

• Upgrade, wait for healthy status & operations, move on

• OpsCenter provides good overview

Page 25: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Production Experiences

Speaking of OpsCenter… • Don’t be alarmed if nodes appear but agent

data does not

• opscenterd often needs a restart after cluster upgrade to see agents again

Page 26: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Production Experiences

Service Discovery • Running on AWS using EC2MultiRegionSnitch

• Using OpsWorks (Amazon’s Chef service) for seed config

Page 27: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Chef Cookbookgithub.com/skyebook/cassandra-opsworks-chef-cookbook

• Forked from Michael Klishin’s awesome C* cookbook

• Added integration with OpsWorks’ stack.json# Add this node as the first seed# If using the multi-region snitch, we must use the public IP addressif node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << node["opsworks"]["instance"]["ip"]else seed_array << node["opsworks"]["instance"]["private_ip"]end!node["opsworks"]["layers"]["cassandra"]["instances"].each do |instance_name, values| if node["cassandra"]["snitch"] == "Ec2MultiRegionSnitch" seed_array << values["ip"] else seed_array << values["private_ip"] endend set[:cassandra][:seeds] = seed_array

Page 28: Cassandra Day NY 2014: Utilizing Apache Cassandra at UltraVisual

Questions