State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis...

43
State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced

Transcript of State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis...

Page 1: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

State of Cassandra, 2012Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax@spyced

Page 2: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Some Cassandra users, early 2011

Page 3: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Some Casandra users, mid 2012

Page 4: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

eBay

Application/Use Case• Social Signals: like/want/own features for

eBay product and item pages• Hunch taste graph for eBay users and items• Many time series use cases

Why Cassandra? • Multi-datacenter• Scalable• Write performance• Distributed counters• Hadoop support

ACE

Page 5: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Time series data

Page 6: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Multi-datacenter support

Page 7: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Distributed counters

Page 8: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Hadoop support

Page 9: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Disney

Application/Use Case• Meet the data management needs of user

facing applications across The Walt Disney Company with a single platform

Why Cassandra? • DataStax Enterprise can tackle real-time

and search functions in the same cluster• Scalability• 24x7 uptime

NDI

Page 10: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Multitenancy

3

12

Page 11: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Multitenancy

Page 12: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Enterprise search

Page 13: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

SimpleReach

Application/Use Case• SimpleReach tracks social actions for

content creators, from Twitter and Facebook to Pinterest and Reddit, to deliver detailed insights and clear metrics around social behavior.

Why Cassandra? • Very high velocity data ingest rate and

large data volumes• Workload separation between realtime and

batch applications

NDE

Page 14: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

SourceNinja

Application/Use Case• SourceNinja notifies you to performance,

security, and bug fixes for the software you depend on

Why Cassandra? • Previous database system could not

handle load; HBase has too many points of failure and was too slow

• Fast real time capabilities, batch analytics on that data, and enterprise search

RDE

Page 15: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Realtime + search + analytics = DataStax Enterprise

Page 16: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Netflix

Application/Use Case• General purpose backend for large scale

highly available cloud based web services supporting Netflix Streaming

Why Cassandra? • Highly available, highly robust and no

schema change downtime• Highly scalable, optimized for SSD• Much lower cost than previous Oracle and

SimpleDB implementations• Flexible data model• Ability to directly influence/implement

OSS feature set• Supports local and wide area distributed

operations, spanning US and Europe

RCE

Page 17: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Optimized for SSD

Page 18: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Open source

Page 19: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

• Massively scalable

• High performance

• Reliable/Available

Use case patterns

Page 20: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Page 21: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

0

5000

10000

15000

20000

25000

30000

35000

Cassandra 0.6

Cassandra 1.0

reads/s writes/s

Page 22: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Page 23: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Recent Cassandra history• 0.7 (Jan 2011)• CREATE COLUMN FAMILY

• TTL

• Secondary (column) indexes

• 0.8 (Jun 2011)• Counters

• Automatic memtable tuning

• 1.0 (Oct 2011)• Compression

• Leveled compaction

Page 24: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Present• 1.1 (Apr 2012)• Self-tuning row + key caches

• Support for mixed SSD + HDD nodes

• Row-level isolation

Page 25: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Self-tuning Row Cache

25

Client

Merge

SSTables

Client Row Cache

WithoutCache

WithCache

Page 26: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Mixed SSD/HDD Support

26

Client

Cassandra Node

SSDHDD

Cassandra Instance

user_sessions

user_activity

user_sessionsuser_activity

Page 27: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Row Level Isolation

27

Bar

Login

FooFoo

Passwd

BarBar

Login

FooFoo

Passwd

BarFooUPDATE Users

SET login='bar'AND password='bar'WHERE key='e29b-41d4'

SELECT login, passwordFROM UsersWHERE key='e29b-41d4'

Bar, Foo Bar, Bar

Bar

Cassandra 1.0 Cassandra 1.1

Page 28: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

ACID

28

Page 29: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Overloading “consistency”• ACID consistency = referential integrity

• Distributed system consistency• {consistency, availability, partition tolerance}

29

Page 30: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Future• 1.2 (Oct 2012?)• Concurrent schema changes

• JBOD support

• Virtual nodes

• CQL3

• Collections

Page 31: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Concurrent Schema Changes

31

CassandraCluster

Client

CREATE TABLE X;...

DROP TABLE X;

Client

CREATE TABLE Y;...

DROP TABLE Y;

Page 32: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

JBOD support

HDD2HDD1

Cassandra Instance

HDD3 HDD4

Page 33: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

JBOD support

HDD2HDD1

Cassandra Instance

HDD3 HDD4X

Page 34: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Virtual nodes

F

C

B

E

A

D

Ring without vnodes

A

N

K

H

E

JM

Ring with vnodes

C

F

P

B

L

I

O

D

G

Page 35: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Node Rebuild without vnodes

35

F

C

B

E

A

D

Ring without vnodes

A

F E

Node 1 Node 2 Node 3

Node 4 Node 6Node 5

B

A F

C

B A

D

B

E

D C

F

DC E

Page 36: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Node Rebuild with vnodes

36

A

N

K

H

E

JM

Ring with VNodes

C

F

P

B

L

I

O

D

G

B

G

E

K

D J

L

A

O

D H

K F

K G

J F

P

M

I

O

H

B L

F D

E

I

P

A

M C

G N

H

B

C

O

N

J L

Node 1 Node 2 Node 3

Node 4 Node 6Node 5

E

M

I

C N

P

A

Page 37: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE INDEX ON users(state);

SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;

CQL: You got SQL in my NoSQL!

Page 38: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

Strictly “realtime” focused• No joins

• No subqueries

• No aggregation functions* or GROUP BY

• Strictly limited ORDER BY

Page 39: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

create column family sblocks with comparator = 'UUIDType' and default_validation_class = 'BytesType' and key_validation_class = 'UUIDType'

Example: CFS sblocks

Page 40: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

sblocks in context

Page 41: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

CREATE TABLE sblocks (    block_id uuid,    subblock_id uuid,    data blob,    PRIMARY KEY (block_id, subblock_id));

sblocks in CQL3

block_id subblock_id data

Block1 subblock A data ABlock1 subblock B data B

... ... ...

Block2 subblock C data CBlock2 subblock D data D

... ... ...

Block3 subblock E data EBlock3 subblock F data F

... ... ...

Page 42: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

Collections

XCREATE TABLE users_addresses ( user_id uuid REFERENCES users, email text);

SELECT *FROM users NATURAL JOIN users_addresses;

Page 43: State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

©2012 DataStax

CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int, email_addresses set<text>);

Collections

UPDATE usersSET email_addresses = email_addresses + {‘[email protected]’, ‘[email protected]’};