Introduction to Apache Cassandra

Introduction to Apache Cassandra

Luke Tillman (@LukeTillman)

Language Evangelist at DataStax

Who are you?!

• Evangelist with a focus on the .NET Community

• Long-time Developer

• Recently presented at Cassandra Summit 2014 with Microsoft

• Very Recent Denver Transplant

2

DataStax and Cassandra

• DataStax Enterprise – Apache Cassandra, now

with more QA!

– Easy integrations with Solr, Apache Spark, Hadoop

• Dev and Ops Tooling – DevCenter IDE, OpsCenter

• Open source drivers – Java, C#, Python, C++,

Ruby, NodeJS

3

• Unlimited, free use of DataStax Enterprise

• No limit on number of nodes or other hidden restrictions

• If you’re a startup, it’s free.

• Requirements:

– < $2M annual revenue, < $20M capital raised

4

www.datastax.com/startups

1 What is Cassandra?

2 How does it work?

3 Cassandra Query Language (CQL)

4 Who’s using it?

5 Questions

5

What is Cassandra?

6

What is Cassandra?

• A Linearly Scaling and Fault Tolerant Distributed Database

• Fully Distributed

– Data spread over many nodes

– All nodes participate in a cluster

– All nodes are equal

– No SPOF (shared nothing)

7

What is Cassandra?

• Linearly Scaling

– Have More Data? Add more nodes.

– Need More Throughput? Add more nodes.

8

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

What is Cassandra?

• Fault Tolerant

– Nodes Down != Database Down

– Datacenter Down != Database Down

9

What is Cassandra?

• Fully Replicated

• Clients write local

• Data syncs across WAN

• Replication Factor per DC

10

US Europe

Client

Cassandra and the CAP Theorem

• The CAP Theorem limits what distributed systems can do

• Consistency

• Availability

• Partition Tolerance

• Limits? “Pick 2 out of 3”

11


Consistency • When I ask the same question to any part of the system, I should get the same answer

12

Is he guilty yet? No. No.

No.

Consistent


Consistency • When I ask the same question to any part of the system, I should get the same answer

13

Is he guilty yet? No. Yes.

Yes.

Not Consistent


Availability • When I ask a question, I will get an answer

14

Is he guilty yet? Yes.

Available


Availability • When I ask a question, I will get an answer

15

Is he guilty yet?

I don’t know, we

have to wait for

Dreamy to wake up.

Not Available


Partition Tolerance • I can ask questions even when the system is having intra-system communication problems.

16

Is he guilty yet?

Tolerant

No.

Team Tyrion Team Cersei


Partition Tolerance • I can ask questions even when the system is having intra-system communication problems.

17

Is he guilty yet?

Not Tolerant

I’m not sure without

asking them and we’re

not speaking (I’m pretty

sure that one helped kill

my sister).

Team Tyrion Team Cersei


• Cassandra is an AP system that is Eventually Consistent

18

Is he guilty yet? No.

Wait, he’s

going to take

the black. Yes. No.

Eventually Consistent


• Cassandra is an AP system that is Eventually Consistent

19

Is he guilty yet? Yes. Yes.

Eventually Consistent

Yes.

How does it work?

20

Two knobs control Cassandra fault tolerance

• Replication Factor (server side)

– How many copies of the data should exist?

21

Client

B AD

C AB

A CD

D BC

Write A

RF=3

Two knobs control Cassandra fault tolerance

• Consistency Level (client side)

– How many replicas do we need to hear from before we acknowledge?

22

Client

B AD

C AB

A CD

D BC

Write A

CL=QUORUM

Client

B AD

C AB

A CD

D BC

Write A

CL=ONE

Consistency Levels

• Applies to both Reads and Writes (i.e. is set on each query)

• ONE – one replica from any DC

• LOCAL_ONE – one replica from local DC

• QUORUM – 51% of replicas from any DC

• LOCAL_QUORUM – 51% of replicas from local DC

• ALL – all replicas

• TWO

23

Consistency Level and Speed

• How many replicas we need to hear from can affect how quickly

we can read and write data in Cassandra

24

Client

B AD

C AB

A CD

D BC

5 µs ack

300 µs ack

12 µs ack

12 µs ack

Read A

(CL=QUORUM)

Consistency Level and Availability

• Consistency Level choice affects availability

• For example, QUORUM can tolerate one replica being down and

still be available (in RF=3)

25

Client

B AD

C AB

A CD

D BC

A=2

A=2

A=2

Read A

(CL=QUORUM)

Consistency Level and Eventual Consistency

• Cassandra is an AP system that is Eventually Consistent so

replicas may disagree

• Column values are timestamped

• In Cassandra, Last Write Wins (LWW)

26

Client

B AD

C AB

A CD

D BC

A=2

Newer

A=1

Older

A=2

Read A

(CL=QUORUM)

Christos from Netflix: “Eventual Consistency != Hopeful Consistency”

https://www.youtube.com/watch?v=lwIA8tsDXXE



Writes in the cluster

• Fully distributed, no SPOF

• Node that receives a request is the Coordinator for request

• Any node can act as Coordinator

27

Client

B AD

C AB

A CD

D BC

Write A

(CL=ONE)

Coordinator Node

Writes in the cluster – Data Distribution

• Partition Key determines node placement

28

Partition Key

id='pmcfadin' lastname='McFadin'

id='jhaddad' firstname='Jon' lastname='Haddad'

id='ltillman' firstname='Luke' lastname='Tillman'

CREATE TABLE users ( id text, firstname text, lastname text, PRIMARY KEY (id) );

Writes in the cluster – Data Distribution

• The Partition Key is hashed using a consistent hashing function

(Murmur 3) and the output is used to place the data on a node

• The data is also replicated to RF-1 other nodes

29

Partition Key


Murmur3 id: ltillman Murmur3: A

B AD

C AB

A CD

D BC

RF=3

Hashing – Back to Reality

• Back in reality, Partition Keys actually hash to 128 bit numbers

• Nodes in Cassandra own token ranges (i.e. hash ranges)

30

B AD

C AB

A CD

D BC

Range Start End

A 0xC000000..1 0x0000000..0

B 0x0000000..1 0x4000000..0

C 0x4000000..1 0x8000000..0

D 0x8000000..1 0xC000000..0

Partition Key

id='ltillman' Murmur3 0xadb95e99da887a8a4cb474db86eb5769

Writes on a single node

• Client makes a write request

Client

UPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'

Disk

Memory


• Data is appended to the Commit Log

• Cassandra writes are FAST due to log appended storage

Client


Commit Log

id='ltillman', firstname='Luke'

…

…

Disk

Memory


• Data is written to Memtable

Client


Commit Log


…

…

Disk

Memory

Memtable for Users Some

Other

Memtable id='ltillman' firstname='Luke' lastname='Tillman'


• Server acknowledges to client

Client


Commit Log


…

…

Disk

Memory


Other



• Once Memtable is full, data is flushed to disk as SSTable (Sorted

String Table)

Client


Data Directory

Disk

Memory


Other


Some

Other

SSTable

SSTable

#1 for

Users

SSTable

#2 for

Users

Compaction

• Compactions merge and unify data in our SSTables

• SSTables are immutable, so this is when we consolidate rows

36

SSTable

#1 for

Users

SSTable

#2 for

Users

SSTable #3 for

Users

id='ltillman' firstname='Lucas' (timestamp=Older)

lastname='Tillman'


id='ltillman' firstname='Luke' (timestamp=Newer)

Reads in the cluster

• Same as writes in the cluster, reads are coordinated

• Any node can be the Coordinator Node

37

Client

B AD

C AB

A CD

D BC

Read A

(CL=QUORUM)

Coordinator Node

Reads on a single node

• Client makes a read request

38

Client

SELECT firstname, lastname FROM users WHERE id = 'ltillman'

Disk

Memory


• Data is read from (possibly multiple) SSTables and merged

• Reads in Cassandra are also FAST but are limited by Disk IO

39

Client


Disk

Memory

SSTable #1 for Users


lastname='Tillman'

SSTable #2 for Users


firstname='Luke' lastname='Tillman'


• Any unflushed Memtable data is also merged

40

Client


Disk

Memory

firstname='Luke' lastname='Tillman' Memtable

for Users


• Client gets acknowledgement with the data

41

Client


Disk

Memory

firstname='Luke' lastname='Tillman'

Compaction - Revisited

• Compactions merge and unify data in our SSTables, making

them important to reads (less SSTables = less to read/merge)

42

SSTable

#1 for

Users

SSTable

#2 for

Users

SSTable #3 for

Users


lastname='Tillman'



Cassandra Query Language (CQL)

43

Data Structures

• Keyspace is like RDBMS Database or Schema

• Like RDBMS, Cassandra uses Tables to store data

• Partitions can have one row (narrow) or multiple

rows (wide)

44

Keyspace

Tables

Partitions

Rows

Schema Definition (DDL)

• Easy to define tables for storing data

• First part of Primary Key is the Partition Key

CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );

Schema Definition (DDL)

• One row per partition (familiar)

CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );

name ...

Keyboard Cat ...

Nyan Cat ...

Original Grumpy Cat ...

videoid

689d56e5- …

93357d73- …

d978b136- …

Clustering Columns

• Second part of Primary Key is Clustering Columns

• Clustering columns affect ordering of data (on disk)

• Multiple rows per partition

47

CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);

Clustering Columns – Wide Rows (Partitions)

• Use of Clustering Columns is where the term “Wide Rows”

comes from

48

videoid='0fe6a...'

userid= 'ac346...'

comment= 'Awesome!'

commentid='82be1...' (10/1/2014 9:36AM)

userid= 'f89d3...'

comment= 'Garbage!'

commentid='765ac...' (9/17/2014 7:55AM)

CREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid) ) WITH CLUSTERING ORDER BY (commentid DESC);

Inserts and Updates

• Use INSERT or UPDATE to add and modify data

• Both will overwrite data (no constraints like RDBMS)

• INSERT and UPDATE functionally equivalent 49

INSERT INTO comments_by_video ( videoid, commentid, userid, comment) VALUES ( '0fe6a...', '82be1...', 'ac346...', 'Awesome!');

UPDATE comments_by_video SET userid = 'ac346...', comment = 'Awesome!' WHERE videoid = '0fe6a...' AND commentid = '82be1...';

TTL and Deletes

• Can specify a Time to Live (TTL) in seconds when doing an

INSERT or UPDATE

• Use DELETE statement to remove data

• Can optionally specify columns to remove part of a row

50

INSERT INTO comments_by_video ( ... ) VALUES ( ... ) USING TTL 86400;

DELETE FROM comments_by_video WHERE videoid = '0fe6a...' AND commentid = '82be1...';

Querying

• Use SELECT to get data from your tables

• Always include Partition Key and optionally Clustering Columns

• Can use ORDER BY and LIMIT

• Use range queries (for example, by date) to slice partitions

51

SELECT * FROM comments_by_video WHERE videoid = 'a67cd...' LIMIT 10;

Cassandra Data Modeling

• Requires a different mindset than RDBMS modeling

• Know your data and your queries up front

• Queries drive a lot of the modeling decisions (i.e. “table per

query” pattern)

• Denormalize/Duplicate data at write time to do as few queries

as possible come read time

• Remember, disk is cheap and writes in Cassandra are FAST

52

Cassandra Data Modeling – A Quick Example

• Users need to be looked up by a unique Id, but when logging in,

need to look them up by email address

• Some data is duplicated (email, userid) but that’s OK

53

CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, PRIMARY KEY (userid) );

CREATE TABLE users_by_email ( email text, password text, userid uuid, PRIMARY KEY (email) );

Who’s using it?

54

Cassandra Adoption

Some Common Use Case Categories

• Product Catalogs and Playlists

• Internet of Things (IoT) and Sensor Data

• Messaging (emails, IMs, alerts, comments)

• Recommendation and Personalization

• Fraud Detection

• Time series and temporal ordered data

http://planetcassandra.org/apache-cassandra-use-cases/









The “Slide Heard Round the World”

• From Cassandra Summit 2014, got a lot of attention

• 75,000+ nodes

• 10s of PBs of data

• Millions ops/s

• One of the largest known Cassandra deployments

57

Spotify

• Streaming music web service

• > 24,000,000 music tracks

• > 50TB of data in Cassandra

Why Cassandra?

• Was PostgreSQL, but hit scaling

problems

• Multi Datacenter Availability

• Integration with Spark for data

processing and analytics

Usage

• Catalog

• User playlists

• Artists following

• Radio Stations

• Event notifications

58

http://planetcassandra.org/blog/interview/spotify-scales-to-the-top-of-the-charts-with-apache-cassandra-at-40k-requestssecond/




























eBay

• Online auction site

• > 250TB of data, dozens of nodes,

multiple data centres

• > 6 billion writes, > 5 billion reads

per day

Why Cassandra?

• Low latency, high scale, multiple data

centers

• Suited for graph structures using

wide rows

Usage

• Building next generation of

recommendation engine

• Storing user activity data

• Updating models of user interests in

real time

59

http://planetcassandra.org/blog/5-minute-c-interview-ebay/











FullContact

• Contact management: from multiple

sources, sync, de-dupe, APIs available

• 2 clusters, dozens of nodes, running

in AWS

• Based here in Denver

Why Cassandra?

• Migated from MongoDB after

running into scaling issues

• Operational simplicity

• Resilience and Availability

Usage

• Person API (search by email, Twitter

handle, Facebook, or phone)

• Searched data from multiple sources

(ingested by Hadoop M/R jobs)

• Resolved profiles

60

http://planetcassandra.org/blog/fullcontact-readies-their-search-platform-to-scale-moves-from-mongodb-to-apache-cassandra/



























Instagram

• Photo-sharing, video-sharing and

social networking service

• Originally AWS (Now Facebook data

centers?)

• > 20k writes/second, >15k

reads/second

Why Cassandra?

• Migrated from Redis (problems

keeping everything in memory)

• No painful “sharding” process

• 75% reduction in costs

Usage

• Auditing information – security,

integrity, spam detection

• News feed (“inboxes” or activity feed)

– Likes, Follows, etc.

61

http://planetcassandra.org/blog/instagram-making-the-switch-to-cassandra-from-redis-75-instasavings/

Summit 2014 Presentation: https://www.youtube.com/watch?v=_gc94ITUitY






















https://www.youtube.com/watch?v=_gc94ITUitY

https://www.youtube.com/watch?v=_gc94ITUitY

Netflix

• TV and Movie streaming service

• > 2700+ nodes on over 90 clusters

• 4 Datacenters

• > 1 Trillion operations per day

Why Cassandra?

• Migrated from Oracle

• Massive amounts of data

• Multi datacenter, No SPOF

• No downtime for schema changes

Usage

• Everything! (Almost – 95% of DB use)

• Example: Personalization

– What titles do you play?

– What do you play before/after?

– Where did you pause?

– What did you abandon watching after 5

minutes?

62

http://planetcassandra.org/blog/case-study-netflix/

Summit 2014 Presentation: https://www.youtube.com/watch?v=RMSNLP_ORg8&index=43&list=UUvP-AXuCr-naAeEccCfKwUA







https://www.youtube.com/watch?v=RMSNLP_ORg8&index=43&list=UUvP-AXuCr-naAeEccCfKwUA






Questions?

Follow me for updates or to ask questions later: @LukeTillman

63

Introduction to Apache Cassandra

Technology

Transcript of Introduction to Apache Cassandra