Apache Cassandra - A gentle introduction

75
A gentle introduction by @przemur from Tuesday, October 22, 13

description

A presentation about Cassandra, presented by Przemyslaw Maciolek during DataKRK meetup: www.meetup.com/datakrk/events/145043192/

Transcript of Apache Cassandra - A gentle introduction

Page 1: Apache Cassandra - A gentle introduction

A gentle introduction by @przemur from

Tuesday, October 22, 13

Page 2: Apache Cassandra - A gentle introduction

Tuesday, October 22, 13

Page 3: Apache Cassandra - A gentle introduction

PERFORMANCE

Tuesday, October 22, 13

Page 4: Apache Cassandra - A gentle introduction

http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf

Tuesday, October 22, 13

Page 5: Apache Cassandra - A gentle introduction

http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf

Tuesday, October 22, 13

Page 6: Apache Cassandra - A gentle introduction

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

Tuesday, October 22, 13

Page 7: Apache Cassandra - A gentle introduction

http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html

Tuesday, October 22, 13

Page 8: Apache Cassandra - A gentle introduction

http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/

Tuesday, October 22, 13

Page 9: Apache Cassandra - A gentle introduction

Tuesday, October 22, 13

Page 10: Apache Cassandra - A gentle introduction

A TAXONOMY OF DISTRIBUTED DATABASES

Tuesday, October 22, 13

Page 11: Apache Cassandra - A gentle introduction

ID FIRST LAST

1 John Smith

2 Mike Kowalski

:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”

Employee

Name: John SmithID: 1Employee

Name: Mike KowalskiID: 2

Company Employee:1:Name Employee:2:Name

ACME John Smith Mike Kowalski

John Smith

Mike Kowalski

works with

Tuesday, October 22, 13

Page 12: Apache Cassandra - A gentle introduction

ID FIRST LAST

1 John Smith

2 Mike Kowalski

:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”

Employee

Name: John SmithID: 1Employee

Name: Mike KowalskiID: 2

Company Employee:1:Name Employee:2:Name

ACME John Smith Mike Kowalski

John Smith

Mike Kowalski

works with

Relational (MySQL, Oracle, ...)

Tuesday, October 22, 13

Page 13: Apache Cassandra - A gentle introduction

ID FIRST LAST

1 John Smith

2 Mike Kowalski

:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”

Employee

Name: John SmithID: 1Employee

Name: Mike KowalskiID: 2

Company Employee:1:Name Employee:2:Name

ACME John Smith Mike Kowalski

John Smith

Mike Kowalski

works with

Relational (MySQL, Oracle, ...)

Key-Value(Redis, Riak, Dynamo, ...)

Tuesday, October 22, 13

Page 14: Apache Cassandra - A gentle introduction

ID FIRST LAST

1 John Smith

2 Mike Kowalski

:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”

Employee

Name: John SmithID: 1Employee

Name: Mike KowalskiID: 2

Company Employee:1:Name Employee:2:Name

ACME John Smith Mike Kowalski

John Smith

Mike Kowalski

works with

Relational (MySQL, Oracle, ...)

Key-Value(Redis, Riak, Dynamo, ...)

Document (MongoDB, Couchbase, ...)

Tuesday, October 22, 13

Page 15: Apache Cassandra - A gentle introduction

ID FIRST LAST

1 John Smith

2 Mike Kowalski

:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”

Employee

Name: John SmithID: 1Employee

Name: Mike KowalskiID: 2

Company Employee:1:Name Employee:2:Name

ACME John Smith Mike Kowalski

John Smith

Mike Kowalski

works with

Relational (MySQL, Oracle, ...)

Key-Value(Redis, Riak, Dynamo, ...)

Document (MongoDB, Couchbase, ...)

Graph (Neo4j, ...)

Tuesday, October 22, 13

Page 16: Apache Cassandra - A gentle introduction

ID FIRST LAST

1 John Smith

2 Mike Kowalski

:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”

Employee

Name: John SmithID: 1Employee

Name: Mike KowalskiID: 2

Company Employee:1:Name Employee:2:Name

ACME John Smith Mike Kowalski

John Smith

Mike Kowalski

works with

Relational (MySQL, Oracle, ...)

Key-Value(Redis, Riak, Dynamo, ...)

Wide Column(BigTable, Cassandra, HBase, ...)

Document (MongoDB, Couchbase, ...)

Graph (Neo4j, ...)

Tuesday, October 22, 13

Page 17: Apache Cassandra - A gentle introduction

Consistency

Availability Partition tolerance

“Pick any two”(and have acceptable latency)

Tuesday, October 22, 13

Page 18: Apache Cassandra - A gentle introduction

Consistency

Availability Partition tolerance

“Pick any two”(and have acceptable latency)

RDBMSs

Tuesday, October 22, 13

Page 19: Apache Cassandra - A gentle introduction

Consistency

Availability Partition tolerance

“Pick any two”(and have acceptable latency)

RDBMSs Immediate Consistency: HBase, ...

Tuesday, October 22, 13

Page 20: Apache Cassandra - A gentle introduction

Consistency

Availability Partition tolerance

“Pick any two”(and have acceptable latency)

RDBMSs Immediate Consistency: HBase, ...

Eventual Consistency: Cassandra, Riak, ...

Tuesday, October 22, 13

Page 21: Apache Cassandra - A gentle introduction

Consistency

Availability Partition tolerance

“Pick any two”(and have acceptable latency)

RDBMSs Immediate Consistency: HBase, ...

Eventual Consistency: Cassandra, Riak, ...

+ Configurable (MongoDB, Cassandra - to

some extent, ...)

Tuesday, October 22, 13

Page 23: Apache Cassandra - A gentle introduction

KEY IDEAS

Tuesday, October 22, 13

Page 24: Apache Cassandra - A gentle introduction

•Dynamo partitioning + BigTable model

• simple architecture, minimal administration

•no single point of failure

•closer to the metal (e.g. no HDFS)

• low latency

Tuesday, October 22, 13

Page 25: Apache Cassandra - A gentle introduction

CASSANDRA’S DATA MODEL

Tuesday, October 22, 13

Page 26: Apache Cassandra - A gentle introduction

Keyspace

Column Family

Row (Partition) Key

Column Name

Value

Tuesday, October 22, 13

Page 27: Apache Cassandra - A gentle introduction

Keyspace

Column Family

Row (Partition) Key

Column Name

Value

“Database”

Tuesday, October 22, 13

Page 28: Apache Cassandra - A gentle introduction

Keyspace

Column Family

Row (Partition) Key

Column Name

Value

“Database”

“Table”

Tuesday, October 22, 13

Page 29: Apache Cassandra - A gentle introduction

Keyspace

Column Family

Row (Partition) Key

Column Name

Value

“Database”

“Table”

“Primary ID”

Tuesday, October 22, 13

Page 30: Apache Cassandra - A gentle introduction

Keyspace

Column Family

Row (Partition) Key

Column Name

Value

“Database”

“Table”

“Primary ID”

Sorted “Column”

Tuesday, October 22, 13

Page 31: Apache Cassandra - A gentle introduction

Keyspace

Column Family

Row (Partition) Key

Column Name

Value

“Database”

“Table”

“Primary ID”

Sorted “Column”

“Value”

Tuesday, October 22, 13

Page 32: Apache Cassandra - A gentle introduction

PARTITIONING

Tuesday, October 22, 13

Page 33: Apache Cassandra - A gentle introduction

TWO PARTITIONERS OUT OF THE BOX

• Byte Ordered Partitioner

• Random Partitioner

http://www.datastax.com/docs/1.0/cluster_architecture/partitioning

Tuesday, October 22, 13

Page 34: Apache Cassandra - A gentle introduction

TWO PARTITIONERS OUT OF THE BOX

• Byte Ordered Partitioner

• Random Partitioner

Forget it:•hot spots•uneven distribution•load balancing

http://www.datastax.com/docs/1.0/cluster_architecture/partitioning

Tuesday, October 22, 13

Page 35: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

Tuesday, October 22, 13

Page 36: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

Initial token

Tuesday, October 22, 13

Page 37: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

Initial token

Range: [aaa,bbb)

Tuesday, October 22, 13

Page 38: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

Range: [aaa,bbb)

Tuesday, October 22, 13

Page 39: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

Tuesday, October 22, 13

Page 40: Apache Cassandra - A gentle introduction

Row Key Hash ...

abc ...

klm ...

xyz ... 1aaa

2bbb

4zzz

3xxx

Tuesday, October 22, 13

Page 41: Apache Cassandra - A gentle introduction

Row Key Hash ...

abc ...

klm ...

xyz ... 1aaa

2bbb

4zzz

3xxx

abc

Tuesday, October 22, 13

Page 42: Apache Cassandra - A gentle introduction

Row Key Hash ...

abc ...

klm ...

xyz ... 1aaa

2bbb

4zzz

3xxx

abc

klm

Tuesday, October 22, 13

Page 43: Apache Cassandra - A gentle introduction

Row Key Hash ...

abc ...

klm ...

xyz ... 1aaa

2bbb

4zzz

3xxx

abc

klmxyz

Tuesday, October 22, 13

Page 44: Apache Cassandra - A gentle introduction

WHAT ABOUT THE REPLICATION!?

Tuesday, October 22, 13

Page 45: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

klmxyz

Replication Factor = 2

abc

Warning: greatly simplified.

Checkout snitch docs for more

info.

Tuesday, October 22, 13

Page 46: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

klmxyz

Replication Factor = 2

abc

abc

Warning: greatly simplified.

Checkout snitch docs for more

info.

Tuesday, October 22, 13

Page 47: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

klmxyz

Replication Factor = 2

abcklm

abc

Warning: greatly simplified.

Checkout snitch docs for more

info.

Tuesday, October 22, 13

Page 48: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

klmxyz

Replication Factor = 2

xyz

abcklm

abc

Warning: greatly simplified.

Checkout snitch docs for more

info.

Tuesday, October 22, 13

Page 49: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

abc

klmxyz

Replication Factor = 3

xyz

abcklm

Tuesday, October 22, 13

Page 50: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

abc

klmxyz

Replication Factor = 3

xyz

abcklm

abc

Tuesday, October 22, 13

Page 51: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

abc

klmxyz

Replication Factor = 3

xyz

abcklm

abc

klm

Tuesday, October 22, 13

Page 52: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

abc

klmxyz

Replication Factor = 3

xyz

abcklm

abc

klm xyz

Tuesday, October 22, 13

Page 53: Apache Cassandra - A gentle introduction

1aaa

2bbb

4zzz

3xxx

abc

klmxyz

Replication Factor = 3

xyz

abcklm

abc

klm xyzBTW,

QUORUM = (RF/2)+1

Tuesday, October 22, 13

Page 54: Apache Cassandra - A gentle introduction

Tuesday, October 22, 13

Page 55: Apache Cassandra - A gentle introduction

WHAT HAPPENS WHEN A NEW NODE IS BEING ADDED ?

1aaa

2bbb

4zzz

3xxx

5???

Tuesday, October 22, 13

Page 56: Apache Cassandra - A gentle introduction

VNODES1

aaacccggg

2bbbvvv

mmm

3xxxuuujjj

4zzzeeeddd

Tuesday, October 22, 13

Page 57: Apache Cassandra - A gentle introduction

VNODES1

aaacccggg

2bbbvvv

mmm

3xxxuuujjj

4zzzeeeddd

5

Tuesday, October 22, 13

Page 58: Apache Cassandra - A gentle introduction

1aaaccc

2bbbvvv

3xxxuuujjj

4zzzeeeddd

5ggg

mmm

Tuesday, October 22, 13

Page 59: Apache Cassandra - A gentle introduction

1aaaccc

2bbbvvv

3xxxuuujjj

4zzzeeeddd

5ggg

mmm

This also greatly helps in case when a node is down.

Tuesday, October 22, 13

Page 60: Apache Cassandra - A gentle introduction

CASSANDRA 101

Tuesday, October 22, 13

Page 61: Apache Cassandra - A gentle introduction

INSTALLATION & CONFIGURATION

Tuesday, October 22, 13

Page 62: Apache Cassandra - A gentle introduction

CQL3

SELECT * FROM books;

INSERT INTO

books (author, title, year) VALUES

(‘Moby-Dick’, ‘Herman Melville’, 1851);

DELETE FROM books WHERE author=‘Paulo Coelho’;

Tuesday, October 22, 13

Page 63: Apache Cassandra - A gentle introduction

DATA MODELING PRACTICES

COMPOSITE COLUMNS

Tuesday, October 22, 13

Page 64: Apache Cassandra - A gentle introduction

Author Book Year Number of words

George Orwell Animal Farm 1945 32451

George Orwell 1984 1949 110581

James Joyce Ulysses 1922 265192

Tuesday, October 22, 13

Page 65: Apache Cassandra - A gentle introduction

Author Book Year Number of words

George Orwell Animal Farm 1945 32451

George Orwell 1984 1949 110581

James Joyce Ulysses 1922 265192

CREATE TABLE books (author varchar,title varchar,year integer,number_of_words integer,PRIMARY KEY (author, title)

);

Tuesday, October 22, 13

Page 66: Apache Cassandra - A gentle introduction

Author Book Year Number of words

George Orwell Animal Farm 1945 32451

George Orwell 1984 1949 110581

James Joyce Ulysses 1922 265192

George Orwell [1984, Year]: 1949 [1984, Number of words]: 110581

[Animal Farm, Year]: 1945

[Animal Farm, Number of words]: 32451

James Joyce [Ulysses, Year]: 1922 [Ulysses, Number of words]: 265192

CREATE TABLE books (author varchar,title varchar,year integer,number_of_words integer,PRIMARY KEY (author, title)

);

Tuesday, October 22, 13

Page 68: Apache Cassandra - A gentle introduction

SETS, LISTS, MAPS

Tuesday, October 22, 13

Page 69: Apache Cassandra - A gentle introduction

CONSOLE TIME

Tuesday, October 22, 13

Page 70: Apache Cassandra - A gentle introduction

WHAT WE HIT?

Tuesday, October 22, 13

Page 71: Apache Cassandra - A gentle introduction

•no DESCRIBE when calling from a client

•cache settings

• insertion performance with 100 000’s of columns

•PRIMARY KEY((a,b,c),d)

•compaction settings

Tuesday, October 22, 13

Page 72: Apache Cassandra - A gentle introduction

I WANT TO KNOW MORE

Tuesday, October 22, 13

Page 74: Apache Cassandra - A gentle introduction

BY THE WAY...

Tuesday, October 22, 13

Page 75: Apache Cassandra - A gentle introduction

HAVE YOU SEEN HIM?

getb

ase.c

om

getb

ase.c

om

getb

ase.c

om

getb

ase.c

om

getb

ase.c

om

getb

ase.c

om

getb

ase.c

om

getb

ase.c

om

getb

ase.c

om

getb

ase.c

om

getb

ase.c

om

getb

ase.c

omTuesday, October 22, 13