Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

88
©2013 DataStax Confidential. Do not distribute without consent. CTO, DataStax Jonathan Ellis Project Chair, Apache Cassandra Modern Apache Cassandra 1

description

 

Transcript of Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Page 1: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

©2013 DataStax Confidential. Do not distribute without consent.

CTO, DataStax

Jonathan EllisProject Chair, Apache Cassandra

Modern Apache Cassandra

1

Page 2: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Five years of Cassandra

Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13

0.1 0.3 0.6 0.7 1.0 1.2...

2.0

DSE

Jul-08

Page 3: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013
Page 4: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Application/Use Case• Social Signals: like/want/own

features for eBay product and item pages

• Hunch taste graph for eBay users and items

• Many time series use cases

Why Cassandra? • Multi-datacenter• Scalable• Write performance• Distributed counters• Hadoop support

ACE

Page 5: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Time series data

Page 6: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Multi-datacenter support

Page 7: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013
Page 8: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Distributed counters

Page 9: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Hadoop support

Page 10: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Application/Use Case• Adobe AudienceManager: web

analytics, content management, and online advertising

Why Cassandra? • Low-latency• Scalable• Multi-datacenter• Tuneable consistency

ACE

Page 11: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Bootstrapping

Page 12: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Bootstrapping

Page 13: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Bootstrapping

sd

s d

sd

sd

Page 14: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Bootstrapping

sd

s d

sd

sd

Page 15: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Bootstrapping

Page 16: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Tuneable consistency•(We’ll come back to this)

Page 17: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Application/Use Case• Logging• Notifications

Why Cassandra? • Efficient writes• Durable• Scalable• High availability

ACE

Page 18: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Durable + efficient writes

Memory

Hard drive

Memtable

write( , )k1 c1:v1

Commit log

Page 19: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Memory

Hard drive

Memtable

write( , k1 c1:v

Commit log

k1 c1:v

k1 c1:v

Page 20: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Memory

Hard drive

write( , k1 c2:v

k1 c1:v

k1 c1:v

k1 c2:v

c2:v

Page 21: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Memory

Hard drive

k1 c1:v

k1 c1:v

k1 c2:v

c2:v

write( , )k2 c1:v c2:v

k2 c1:v c2:v

k2 c1:v c2:v

Page 22: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Memory

Hard drive

k1 c1:v

k1 c1:v

k1 c2:v

c2:v

write( , )k1 c1:v c3:v

k2 c1:v c2:v

k2 c1:v c2:v

k1 c1:v c3:v

c3:v

Page 23: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Memory

Hard drive

SSTable

flush

k1 c1:v c2:v

k2 c1:v c2:v

c3:v

index / BF

cleanup

Page 24: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

High availability•99.9999% availability on Cassandra•(We’ll come back to this, too)

Page 25: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Core values•Massive scalability•High performance

•Ease of use

•Reliability/Availabilty

Cassandra HBase RedisMySQL

Page 26: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

0

20000

40000

60000

80000

0 2 4 6 8 10 12

Cassandra HBase RedisMySQL

NUMBER OF NODES

THRO

UG

HPU

T O

PS/S

EC) CASSANDRA

VLDB benchmark (RWS)

Page 27: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

0

8750

17500

26250

35000

1 2 4 8 16 32

Cassandra HBase MongoDB

CASSANDRA

Endpoint benchmark (RW)TH

ROU

GH

PUT

OPS

/SEC

)

NUMBER OF NODES

Page 28: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Ease of useCREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int);

CREATE INDEX ON users(state);

SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950;

Page 29: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013
Page 30: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Classic partitioning (SPOF)

partition 1 partition 2 partition 3 partition 4

router

client

Page 31: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

(Not a theoretical problem)

https://speakerdeck.com/mitsuhiko/a-year-of-mongodb

http://aphyr.com/posts/288-the-network-is-reliable

Page 32: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Fully distributed, no SPOF

p1

p1

p1p3

p6

Client

Page 33: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Primary key determines placement*

Partitioning

jim

carol

johnny

suzy

age: 36 car: camaro gender: M

age: 37 car: subaru gender: F

age:12 gender: M

age:10 gender: F

Page 34: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

jim

carol

johnny

suzy

PK

5e02739678...

a9a0198010...

f4eb27cea7...

78b421309e...

Murmur Hash

Murmur* hash operation yields a 64-bit number for keysof any size.

Page 35: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Node A

Node D Node C

Node B

The “token ring”

Page 36: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 37: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 38: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 39: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 40: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..

10x0000000000..0

B 0x0000000000..1

0x4000000000..0

C 0x4000000000..1

0x8000000000..0

D 0x8000000000..1

0xc000000000..0

Page 41: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Node A

Node D Node C

Node B

carol a9a0198010...

Replication

Page 42: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Node A

Node D Node C

Node B

carol a9a0198010...

Page 43: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Node A

Node D Node C

Node B

carol a9a0198010...

Page 44: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

C’’A’’

D’

C’A’ D

A

B’

CB

Virtual nodes

Node A

Node D Node C

Node B

Without vnodes With vnodes

Page 45: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

A closer look at reads

Client Coordinator

40%busy

90%busy

30%busy

Page 46: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

A closer look at reads

Client Coordinator

40%busy

90%busy

30%busy

Page 47: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

A closer look at reads

Client Coordinator

40%busy

90%busy

30%busy

Page 48: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

A closer look at reads

Client Coordinator

40%busy

90%busy

30%busy

Page 49: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

A closer look at reads

Client Coordinator

40%busy

90%busy

30%busy

Page 50: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Rapid read protection

Client Coordinator

40%busy

90%busy

30%busy

Page 51: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Rapid read protection

Client Coordinator

40%busy

90%busy

30%busy

Page 52: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Rapid read protection

Client Coordinator

40%busy

90%busy

30%busy

Page 53: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Rapid read protection

Client Coordinator

40%busy

90%busy

30%busyX

Page 54: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Rapid read protection

Client Coordinator

40%busy

90%busy

30%busyX

Page 55: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Rapid read protection

Client Coordinator

40%busy

90%busy

30%busyX

Page 56: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Rapid read protection

Client Coordinator

40%busy

90%busy

30%busyX

Page 57: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Rapid Read Protection

NONE

Page 58: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Consistency levels

Client Coordinator

40%busy

90%busy

30%busy

Page 59: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Consistency levels

Client Coordinator

40%busy

90%busy

30%busy

Page 60: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Consistency levels

Client Coordinator

40%busy

90%busy

30%busy

Page 61: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Consistency levels

Client Coordinator

40%busy

90%busy

30%busy

Page 62: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Consistency levels

Client Coordinator

40%busy

90%busy

30%busy

Page 63: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Consistency levels•ONE•QUORUM

•LOCAL_QUORUM

•LOCAL_ONE•TWO

•ALL

Page 64: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';

Page 65: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';

(0 rows) SELECT nameFROM usersWHERE username = 'pmcfadin';

Page 66: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';

(0 rows) SELECT nameFROM usersWHERE username = 'pmcfadin';

INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00');

(0 rows)

Page 67: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';

(0 rows) SELECT nameFROM usersWHERE username = 'pmcfadin';

INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00');

(0 rows)

INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ea24e13ad9...', '2011-06-20 13:50:01');

Page 68: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

#CASSANDRAEURace conditionSELECT nameFROM usersWHERE username = 'pmcfadin';

This one wins

(0 rows) SELECT nameFROM usersWHERE username = 'pmcfadin';

INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00');

(0 rows)

INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ea24e13ad9...', '2011-06-20 13:50:01');

Page 69: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

#CASSANDRAEULightweight transactionsINSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00')IF NOT EXISTS;

Page 70: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

#CASSANDRAEULightweight transactionsINSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00')IF NOT EXISTS;

[applied]----------- True

INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ea24e13ad9...', '2011-06-20 13:50:01')IF NOT EXISTS;

Page 71: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

#CASSANDRAEULightweight transactions

[applied] | username | created_date | name -----------+----------+----------------+---------------- False | pmcfadin | 2011-06-20 ... | Patrick McFadin

INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ba27e03fd9...', '2011-06-20 13:50:00')IF NOT EXISTS;

[applied]----------- True

INSERT INTO users (username, name, email, password, created_date)VALUES ('pmcfadin', 'Patrick McFadin', ['[email protected]'], 'ea24e13ad9...', '2011-06-20 13:50:01')IF NOT EXISTS;

Page 72: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Paxos•All operations are quorum-based•Each replica sends information about unfinished operations to the leader during prepare

•Paxos made Simple

Page 73: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Details•4 round trips vs 1 for normal updates•Paxos state is durable

•Immediate consistency with no leader election or failover

•ConsistencyLevel.SERIAL•http://www.datastax.com/dev/blog/lightweight-transactions-in-cassandra-2-0

Page 75: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Cassandra 2.1

Page 76: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

User defined typesCREATE TYPE address (

street text, city text, zip_code int, phones set<text>)

CREATE TABLE users ( id uuid PRIMARY KEY, name text, addresses map<text, address>)

SELECT id, name, addresses.city, addresses.phones FROM users;

id | name | addresses.city | addresses.phones--------------------+----------------+-------------------------- 63bf691f | jbellis | Austin | {'512-4567', '512-9999'}

Page 77: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Collection indexingCREATE TABLE songs (

id uuid PRIMARY KEY, artist text, album text, title text, data blob, tags set<text>);

CREATE INDEX song_tags_idx ON songs(tags);

SELECT * FROM songs WHERE 'blues' IN tags;

id | album | artist | tags | title----------+---------------+-------------------+-----------------------+------------------ 5027b27e | Country Blues | Lightnin' Hopkins | {'acoustic', 'blues'} | Worrying My Mind

Page 78: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

More-efficient repair

Page 79: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

More-efficient repair

Page 80: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

More-efficient repair

Page 81: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

More-efficient repair

Page 82: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

More-efficient repair

Page 83: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

More-efficient repair

Page 84: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

More-efficient repair

Page 85: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

More-efficient repair

Page 86: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

More-efficient repair

Page 87: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

2.1 roadmap•Efficient handling of cold data•Counters 2.0

•Only repair new-since-last-repair data

•January/February 2014

Page 88: Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf 2013

Вопросы?