Introduction to cassandra 2014

52
©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin Chief Evangelist/Solution Architect - DataStax Introduction to Cassandra

description

Apache Cassandra is a popular choice for a wide variety of application persistence needs. There are many design choices that can effect uptime and performance. In this talk we'll look at some of the many things to consider from a single server to multiple data centers. Basic understanding of Cassandra features coupled with client driver features can be a very powerful combination. This talk will be an introduction but will deep dive into the technical details of how Cassandra works.

Transcript of Introduction to cassandra 2014

Page 1: Introduction to cassandra 2014

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadin Chief Evangelist/Solution Architect - DataStax

Introduction to Cassandra

Page 2: Introduction to cassandra 2014

Cassandra - An introduction

Page 3: Introduction to cassandra 2014

Five years of Cassandra

0 1 2 3 4 5

0.1 0.3 0.6 0.7 1.0 1.2...

2.0

DSE

Jul-08

Page 4: Introduction to cassandra 2014
Page 5: Introduction to cassandra 2014

Why Cassandra?

Or any non-relational?

Page 6: Introduction to cassandra 2014

The world has changed

!More Connected!More Data

Page 7: Introduction to cassandra 2014

!Less Tolerant

For the record. Can you spell HTTP?

Page 8: Introduction to cassandra 2014

!Traditional solutions…

!…may not be a good fit

Page 9: Introduction to cassandra 2014

We still try though

shard 1 shard 2 shard 3 shard 4

router

client

All your wildest !dreams will come !

true.

Page 10: Introduction to cassandra 2014

A new plan

Let’s apply a little Computer Science

Page 11: Introduction to cassandra 2014

Dynamo Paper(2007)•How do we build a data store that is: • Reliable • Performant • “Always On” •Nothing new and shiny • 24 papers cited

!!Evolutionary. Real. Computer Science

Also the basis for Riak and Voldemort

Page 12: Introduction to cassandra 2014

BigTable(2006)

• Richer data model • 1 key. Lots of values • Fast sequential access • 38 Papers cited

Page 13: Introduction to cassandra 2014

Cassandra(2008)

• Distributed features of Dynamo • Data Model and storage from

BigTable • February 17, 2010 it graduated to

a top-level Apache project

Page 14: Introduction to cassandra 2014

Cassandra - More than one server

• All nodes participate in a cluster • Shared nothing • Add or remove as needed •More capacity? Add a server

14

Page 15: Introduction to cassandra 2014

15

Cassandra HBase Redis MySQL

THRO

UG

HPU

T O

PS/S

EC)

VLDB benchmark (RWS)http://globalbigdataconference.com/bigdata-bootcamp-sca-january-2014.php

Page 16: Introduction to cassandra 2014

Cassandra - Fully Replicated

• Client writes local • Data syncs across WAN • Replication Factor per DC

16

Page 17: Introduction to cassandra 2014

Focus on a single server

17

Page 18: Introduction to cassandra 2014

Cassandra - Writes on a single node

Page 19: Introduction to cassandra 2014

Client writes to server

Page 20: Introduction to cassandra 2014

Data written to commit log

Page 21: Introduction to cassandra 2014

Data written to memtable

Page 22: Introduction to cassandra 2014

Server acknowledges to client

Page 23: Introduction to cassandra 2014

Memtable flushed to disk

Page 24: Introduction to cassandra 2014

Dealing with duplicates

Page 25: Introduction to cassandra 2014

Compaction

Page 26: Introduction to cassandra 2014

Compacted file written

Page 27: Introduction to cassandra 2014

Clean up old files

Page 28: Introduction to cassandra 2014

Cassandra - Writes in the cluster

Page 29: Introduction to cassandra 2014

Fully distributed, no SPOF

p1

p1

p1p3

p6

Client

Page 30: Introduction to cassandra 2014

Primary key determines placement*

Partitioning

jim age: 36 car: camaro gender: M

carol age: 37 car: subaru gender: F

johnny age:12 gender: M

suzy age:10 gender: F

Page 31: Introduction to cassandra 2014

jim

carol

johnny

suzy

PK

5e02739678...

a9a0198010...

f4eb27cea7...

78b421309e...

MD5 Hash

MD5* hash operation yields a 128-bit number for keys of any size.

Key Hashing

Page 32: Introduction to cassandra 2014

Node A

Node D Node C

Node B

The Token Ring

Page 33: Introduction to cassandra 2014

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 34: Introduction to cassandra 2014

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 35: Introduction to cassandra 2014

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 36: Introduction to cassandra 2014

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 37: Introduction to cassandra 2014

jim 5e02739678...

carol a9a0198010...

johnny f4eb27cea7...

suzy 78b421309e...

Start EndA 0xc000000000..1 0x0000000000..0

B 0x0000000000..1 0x4000000000..0

C 0x4000000000..1 0x8000000000..0

D 0x8000000000..1 0xc000000000..0

Page 38: Introduction to cassandra 2014

Node A

Node D Node C

Node B

carol a9a0198010...

Replication

Page 39: Introduction to cassandra 2014

Node A

Node D Node C

Node B

carol a9a0198010...

Replication

Page 40: Introduction to cassandra 2014

Node A

Node D Node C

Node B

carol a9a0198010...

ReplicationReplication factor = 3

Consistency?More on that later

Page 41: Introduction to cassandra 2014

C’’A’’

D’

C’A’ D

A

B’

CB

Node A

Node D Node C

Node B

Without vnodes With vnodes

Virtual Nodes

Page 42: Introduction to cassandra 2014

Cassandra - Reads

Page 43: Introduction to cassandra 2014

Coordinated reads

Page 44: Introduction to cassandra 2014

Consistency Level• Set with every read and write • ONE • QUORUM - >51% replicas ack • LOCAL_QUORUM - >51% replicas ack in local DC • LOCAL_ONE - Read repair only in local DC • TWO • ALL - All replicas ack. Full consistency

Page 45: Introduction to cassandra 2014

QUORUM and availability

Page 46: Introduction to cassandra 2014

Rapid Read Protection

NONE

Page 47: Introduction to cassandra 2014

Cassandra Query Language - CQL

Page 48: Introduction to cassandra 2014

CQL Tables• List of columns •No sizes? • First item in PRMARY KEY is the

partition key

CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)!);

INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!

Page 49: Introduction to cassandra 2014

CQL Inserts• Insert will always overwrite

INSERT INTO users (username, firstname, lastname, ! email, password, created_date)!VALUES ('pmcfadin','Patrick','McFadin',! ['[email protected]'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');!

Page 50: Introduction to cassandra 2014

Advanced Data ModelsCREATE TABLE user_activity (!! username varchar,!! interaction_time timeuuid,!! activity_code varchar,!! detail varchar,!! PRIMARY KEY (username, interaction_time)!) WITH CLUSTERING ORDER BY (interaction_time DESC);

Reverse order based on timestamp

INSERT INTO user_activity !(username,interaction_time,activity_code,detail)!VALUES ('pmcfadin',0D1454E0-D202-11E2-8B8B-0800200C9A66,'100','Normal login')!USING TTL 2592000;

Expire after 30 days

Page 51: Introduction to cassandra 2014

Real time data access

username | interaction_time | detail | activity_code!----------+--------------------------------------+------------------------------------------+------------------! pmcfadin | 9ccc9df0-d076-11e2-923e-5d8390e664ec | Entered shopping area: Jewelry | 301! pmcfadin | 9c652990-d076-11e2-923e-5d8390e664ec | Created shopping cart: Anniversary gifts | 202! pmcfadin | 1b5cef90-d076-11e2-923e-5d8390e664ec | Deleted shopping cart: Gadgets I want | 205! pmcfadin | 1b0e5a60-d076-11e2-923e-5d8390e664ec | Opened shopping cart: Gadgets I want | 201! pmcfadin | 1b0be960-d076-11e2-923e-5d8390e664ec | Normal login | 100

select * from user_activity limit 5;

Maybe put a sale item for flowers too?

Page 52: Introduction to cassandra 2014

©2013 DataStax Confidential. Do not distribute without consent. 52