Introduction to Cassandra Basics

62
Introduction to Cassandra Nick Bailey @nickmbailey Monday, October 28, 13

description

An introduction to some basic concepts and data modeling techniques in Cassandra.

Transcript of Introduction to Cassandra Basics

Page 1: Introduction to Cassandra Basics

Introduction to CassandraNick Bailey@nickmbailey

Monday, October 28, 13

Page 2: Introduction to Cassandra Basics

©2012 DataStax

Who am I?

2

Monday, October 28, 13

Page 3: Introduction to Cassandra Basics

©2012 DataStax

What’s DataStax?

3

Monday, October 28, 13

Page 4: Introduction to Cassandra Basics

©2012 DataStax

On to the good stuff!

4

Monday, October 28, 13

Page 5: Introduction to Cassandra Basics

©2012 DataStax

Why Cassandra?

Cluster Architecture

Node Architecture

Data Modeling

Wrap up

5

Monday, October 28, 13

Page 6: Introduction to Cassandra Basics

©2012 DataStax

Why Cassandra?

6

Monday, October 28, 13

Page 7: Introduction to Cassandra Basics

©2012 DataStax

Time for buzz words!

7

Big Data!

NoSQL!

Monday, October 28, 13

Page 8: Introduction to Cassandra Basics

©2012 DataStax

Big Data

8

• Gartner: “...high-volume, high-velocity and high-variety...”

• 2 sides of ‘big data’• Analytics• Real-time

Monday, October 28, 13

Page 9: Introduction to Cassandra Basics

©2012 DataStax

NoSQL

9

• A terrible label

• Covers a wide range of DBs• Cassandra• Redis• MongoDB• HBase• ...

Monday, October 28, 13

Page 10: Introduction to Cassandra Basics

©2012 DataStax

Started by Facebook

10

Monday, October 28, 13

Page 11: Introduction to Cassandra Basics

©2012 DataStax

Dynamo (Amazon)+

Big Table (Google)

11

Monday, October 28, 13

Page 12: Introduction to Cassandra Basics

©2012 DataStax 12

Monday, October 28, 13

Page 13: Introduction to Cassandra Basics

©2012 DataStax

Cassandra is great for...

13

• Massive, linear scaling (e.g. CERN hadron collider, Barracuda Networks)

• Extremely heavy writes(e.g. BlueMountain Capital – financial tick data)

• High availability(e.g. eBay, Eventbrite, Netflix, SoundCloud, HeathCare Anytime, Comcast, GoDaddy, Sony Entertainment Network)

Monday, October 28, 13

Page 14: Introduction to Cassandra Basics

©2012 DataStax 14

Monday, October 28, 13

Page 15: Introduction to Cassandra Basics

©2012 DataStax 15

Monday, October 28, 13

Page 16: Introduction to Cassandra Basics

©2012 DataStax 169

http://techblog.netflix.com/2012/07/lessons-netflix-learned-from-aws-storm.html

Monday, October 28, 13

Page 17: Introduction to Cassandra Basics

©2012 DataStax

One size does not fit all

Polyglot persistence

17

Monday, October 28, 13

Page 18: Introduction to Cassandra Basics

©2012 DataStax

More Resources

18

• PlanetCassandra.org

• Blog

• 5 minute interviews

Monday, October 28, 13

Page 19: Introduction to Cassandra Basics

©2012 DataStax

Cluster Architecture

19

Monday, October 28, 13

Page 20: Introduction to Cassandra Basics

©2012 DataStax

75

0

25

50

Data Distribution

Hash_Function(Partition Key) >> Token

Monday, October 28, 13

Page 21: Introduction to Cassandra Basics

©2012 DataStax

Replication

Monday, October 28, 13

Page 22: Introduction to Cassandra Basics

©2012 DataStax

Failure Modes

Monday, October 28, 13

Page 23: Introduction to Cassandra Basics

©2012 DataStax

Consistency Level

23

• Multiple options• ONE• QUORUM• ALL• LOCAL_QUORUM• ...

• Can be specified per request

Monday, October 28, 13

Page 24: Introduction to Cassandra Basics

©2012 DataStax

Quorum

Monday, October 28, 13

Page 25: Introduction to Cassandra Basics

©2012 DataStax

Quorum

Monday, October 28, 13

Page 26: Introduction to Cassandra Basics

©2012 DataStax

Consistency

WriteCL: ONE

Monday, October 28, 13

Page 27: Introduction to Cassandra Basics

©2012 DataStax

Consistency

ReadCL: One

Monday, October 28, 13

Page 28: Introduction to Cassandra Basics

©2012 DataStax

Failure Types

28

• UnavailableException• Didn’t even try

• TimedOutException• Possible success or failure

Monday, October 28, 13

Page 29: Introduction to Cassandra Basics

©2012 DataStax

Multi DC

Monday, October 28, 13

Page 30: Introduction to Cassandra Basics

©2012 DataStax

Gossip

30

• Manages cluster state• Nodes up/down• Nodes joining/leaving

• Decentralized

Monday, October 28, 13

Page 31: Introduction to Cassandra Basics

©2012 DataStax

Snitch

31

• Responsible for determining cluster topology

• Tracks node responsiveness

• Simple, PropertyFile, Ec2Snitch, etc...

Monday, October 28, 13

Page 32: Introduction to Cassandra Basics

©2012 DataStax

Node Architecture

32

Monday, October 28, 13

Page 33: Introduction to Cassandra Basics

©2012 DataStax

Write Path

33

commit log

Memtable

SSTable

Write

Memory

Disk

Monday, October 28, 13

Page 34: Introduction to Cassandra Basics

©2012 DataStax

Read Path

34

Memtable

SSTable

Read

SSTable

Memory

Disk

Monday, October 28, 13

Page 35: Introduction to Cassandra Basics

©2012 DataStax

Data Modeling

35

Monday, October 28, 13

Page 36: Introduction to Cassandra Basics

©2012 DataStax

CQLCassandra Query Language

36

Monday, October 28, 13

Page 37: Introduction to Cassandra Basics

©2012 DataStax

Terminology

37

• Keyspace

• Table (Column Family)

• Row

• Column

• Partition Key

• Clustering Key (Optional)

Monday, October 28, 13

Page 38: Introduction to Cassandra Basics

©2012 DataStax

CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

CREATE KEYSPACE packagetracker WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'dc1' : 2, 'dc2' : 2};

CREATE TABLE events (package_id text,status_timestamp timestamp,location text,notes text,PRIMARY KEY (package_id, status_timestamp)

);

For Example:

38

Monday, October 28, 13

Page 39: Introduction to Cassandra Basics

©2012 DataStax

Constructs

39

Monday, October 28, 13

Page 40: Introduction to Cassandra Basics

©2012 DataStax

Basic Data Types

40

• blob

• int

• text

• long

• uuid

• etc

Monday, October 28, 13

Page 41: Introduction to Cassandra Basics

©2012 DataStax

More Data Modeling Constructs

41

• Collections• map, set, list

• Time to live (TTL)

• Counters

• Secondary Indexes

Monday, October 28, 13

Page 42: Introduction to Cassandra Basics

©2012 DataStax

Approaching Data Modeling

42

• Model your queries, not your data• Optimize your data model for reads

• Don’t be afraid to denormalize

• You will get it wrong, iterate

Monday, October 28, 13

Page 43: Introduction to Cassandra Basics

©2012 DataStax

An Example:User Logins

43

Monday, October 28, 13

Page 44: Introduction to Cassandra Basics

©2012 DataStax

What are the last 10 locations nickmbailey logged in from?

SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10;

The Query

44

Monday, October 28, 13

Page 45: Introduction to Cassandra Basics

©2012 DataStax

What are the last 10 locations nickmbailey logged in from?

SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10;

The Query

45

Partition Key

Monday, October 28, 13

Page 46: Introduction to Cassandra Basics

©2012 DataStax

What are the last 10 locations nickmbailey logged in from?

SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10;

The Query

46

Clustering Key Partition Key

Monday, October 28, 13

Page 47: Introduction to Cassandra Basics

©2012 DataStax

What are the last 10 locations nickmbailey logged in from?

SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10;

The Query

47

Additional ColumnsClustering Key Partition Key

Monday, October 28, 13

Page 48: Introduction to Cassandra Basics

©2012 DataStax

What are the last 10 locations nickmbailey logged in from?

SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10;

CREATE COLUMN FAMILY logins ( user text, time timestamp, location text, PRIMARY KEY (user, time));

The Query

48

Additional ColumnsClustering Key Partition Key

Monday, October 28, 13

Page 49: Introduction to Cassandra Basics

©2012 DataStax

What are the last 10 locations nickmbailey logged in from?

SELECT time, location FROM logins WHERE user = ‘nickmbailey’ ORDER BY time DESC LIMIT 10;

CREATE COLUMN FAMILY logins ( user text, time timestamp, location text, PRIMARY KEY (user, time));

The Query

49

User Time Locationnickmbailey 2013-07-19 09:22:18 Austin, Texas

nickmbailey 2013-07-19 14:49:27 Blacksburg, Virginia

jsmith 2013-07-20 07:59:34 Atlanta, Georgia

Partition key Primary key

Monday, October 28, 13

Page 50: Introduction to Cassandra Basics

©2012 DataStax

Time-series data

50

• By far, the most common data model

• Event logs

• Metrics

• Sensor Data

• Etc

Monday, October 28, 13

Page 51: Introduction to Cassandra Basics

©2012 DataStax

When was the last time nickmbailey logged in from San Francisco, California?

SELECT time FROM logins WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’;

Another Query

51

User Time Location

nickmbailey 2013-07-19 09:22:18 Austin, Texas

nickmbailey 2013-07-19 14:49:27 Blacksburg, Virginia

nickmbailey 2013-07-19 14:49:27 Austin, Texas

nickmbailey 2013-05-19 14:49:27 Austin, Texas

nickmbailey 2013-04-19 14:49:27 San Francisco, California

... ... ...

jsmith 2013-07-20 07:59:34 Atlanta, Georgia

Monday, October 28, 13

Page 52: Introduction to Cassandra Basics

©2012 DataStax

When was the last time nickmbailey logged in from Austin, Texas?

SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’;

CREATE COLUMN FAMILY logins_by_location (user text, time timestamp, location text, PRIMARY KEY (user, location));

Another Query

52

Monday, October 28, 13

Page 53: Introduction to Cassandra Basics

©2012 DataStax

When was the last time nickmbailey logged in from Austin, Texas?

SELECT time FROM logins_by_location WHERE user = ‘nickmbailey’ and location=‘San Francisco, California’;

CREATE COLUMN FAMILY logins_by_location (user text, time timestamp, location text, PRIMARY KEY (user, location));

Another Query

53

User Location Time

nickmbailey Austin, Texas 2013-07-19 09:22:18

nickmbailey Blacksburg, Virginia 2013-07-19 14:49:27

nickmbailey San Francisco, California 2013-07-19 14:49:27

Monday, October 28, 13

Page 54: Introduction to Cassandra Basics

©2012 DataStax

Denormalize

54

• Create materialized views of the same data to support different queries

• Storage space is cheap, Cassandra is fast

Monday, October 28, 13

Page 55: Introduction to Cassandra Basics

©2012 DataStax

Debugging your data model

55

cqlsh> tracing on;Now tracing requests.

cqlsh:foo> INSERT INTO test (a, b) VALUES (1, 'example');Tracing session: 4ad36250-1eb4-11e2-0000-fe8ebeead9f9

activity | timestamp | source | source_elapsed-------------------------------------+--------------+-----------+---------------- execute_cql3_query | 00:02:37,015 | 127.0.0.1 | 0 Parsing statement | 00:02:37,015 | 127.0.0.1 | 81 Preparing statement | 00:02:37,015 | 127.0.0.1 | 273 Determining replicas for mutation | 00:02:37,015 | 127.0.0.1 | 540 Sending message to /127.0.0.2 | 00:02:37,015 | 127.0.0.1 | 779

Messsage received from /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 63 Applying mutation | 00:02:37,016 | 127.0.0.2 | 220 Acquiring switchLock | 00:02:37,016 | 127.0.0.2 | 250 Appending to commitlog | 00:02:37,016 | 127.0.0.2 | 277 Adding to memtable | 00:02:37,016 | 127.0.0.2 | 378 Enqueuing response to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 710 Sending message to /127.0.0.1 | 00:02:37,016 | 127.0.0.2 | 888

Monday, October 28, 13

Page 56: Introduction to Cassandra Basics

©2012 DataStax

A note on Transactions

56

• In general, you want to construct your data model around them

• The latest version of Cassandra has ‘Compare and swap’• An implementation of Paxos• ...IF NOT EXISTS;• ...IF column1 = ‘value’;

Monday, October 28, 13

Page 57: Introduction to Cassandra Basics

©2012 DataStax

Try it out

57

Monday, October 28, 13

Page 58: Introduction to Cassandra Basics

©2012 DataStax

CCM

58

• CCM - Cassandra Cluster Manager• https://github.com/pcmanus/ccm

• Warning: not lightweight

• Example:• ccm create test -v 2.0.1• ccm populate -n 3• ccm start

Monday, October 28, 13

Page 59: Introduction to Cassandra Basics

©2012 DataStax

Clients

59

• Cqlsh• Bundled with Cassandra

• Drivers• java: https://github.com/datastax/java-driver• python: https://github.com/datastax/python-driver• .net: https://github.com/datastax/csharp-driver• and more: http://www.datastax.com/download/

clientdrivers

Monday, October 28, 13

Page 60: Introduction to Cassandra Basics

©2012 DataStax

Get Help

60

• IRC: #cassandra on freenode

• Mailing Lists

• Stack Overflow

• DataStax Docs• http://www.datastax.com/docs

Monday, October 28, 13

Page 61: Introduction to Cassandra Basics

©2012 DataStax

Questions?

61

Monday, October 28, 13

Page 62: Introduction to Cassandra Basics

Monday, October 28, 13