Apache Cassandra - A gentle introduction
-
Upload
przemek-maciolek -
Category
Technology
-
view
1.344 -
download
0
description
Transcript of Apache Cassandra - A gentle introduction
A gentle introduction by @przemur from
Tuesday, October 22, 13
Tuesday, October 22, 13
PERFORMANCE
Tuesday, October 22, 13
http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf
Tuesday, October 22, 13
http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf
Tuesday, October 22, 13
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
Tuesday, October 22, 13
http://www.networkworld.com/news/tech/2012/102212-nosql-263595.html
Tuesday, October 22, 13
http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/
Tuesday, October 22, 13
Tuesday, October 22, 13
A TAXONOMY OF DISTRIBUTED DATABASES
Tuesday, October 22, 13
ID FIRST LAST
1 John Smith
2 Mike Kowalski
:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”
Employee
Name: John SmithID: 1Employee
Name: Mike KowalskiID: 2
Company Employee:1:Name Employee:2:Name
ACME John Smith Mike Kowalski
John Smith
Mike Kowalski
works with
Tuesday, October 22, 13
ID FIRST LAST
1 John Smith
2 Mike Kowalski
:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”
Employee
Name: John SmithID: 1Employee
Name: Mike KowalskiID: 2
Company Employee:1:Name Employee:2:Name
ACME John Smith Mike Kowalski
John Smith
Mike Kowalski
works with
Relational (MySQL, Oracle, ...)
Tuesday, October 22, 13
ID FIRST LAST
1 John Smith
2 Mike Kowalski
:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”
Employee
Name: John SmithID: 1Employee
Name: Mike KowalskiID: 2
Company Employee:1:Name Employee:2:Name
ACME John Smith Mike Kowalski
John Smith
Mike Kowalski
works with
Relational (MySQL, Oracle, ...)
Key-Value(Redis, Riak, Dynamo, ...)
Tuesday, October 22, 13
ID FIRST LAST
1 John Smith
2 Mike Kowalski
:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”
Employee
Name: John SmithID: 1Employee
Name: Mike KowalskiID: 2
Company Employee:1:Name Employee:2:Name
ACME John Smith Mike Kowalski
John Smith
Mike Kowalski
works with
Relational (MySQL, Oracle, ...)
Key-Value(Redis, Riak, Dynamo, ...)
Document (MongoDB, Couchbase, ...)
Tuesday, October 22, 13
ID FIRST LAST
1 John Smith
2 Mike Kowalski
:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”
Employee
Name: John SmithID: 1Employee
Name: Mike KowalskiID: 2
Company Employee:1:Name Employee:2:Name
ACME John Smith Mike Kowalski
John Smith
Mike Kowalski
works with
Relational (MySQL, Oracle, ...)
Key-Value(Redis, Riak, Dynamo, ...)
Document (MongoDB, Couchbase, ...)
Graph (Neo4j, ...)
Tuesday, October 22, 13
ID FIRST LAST
1 John Smith
2 Mike Kowalski
:name_1 -> “John Smith”:name_2 -> “Mike Kowalski”
Employee
Name: John SmithID: 1Employee
Name: Mike KowalskiID: 2
Company Employee:1:Name Employee:2:Name
ACME John Smith Mike Kowalski
John Smith
Mike Kowalski
works with
Relational (MySQL, Oracle, ...)
Key-Value(Redis, Riak, Dynamo, ...)
Wide Column(BigTable, Cassandra, HBase, ...)
Document (MongoDB, Couchbase, ...)
Graph (Neo4j, ...)
Tuesday, October 22, 13
Consistency
Availability Partition tolerance
“Pick any two”(and have acceptable latency)
Tuesday, October 22, 13
Consistency
Availability Partition tolerance
“Pick any two”(and have acceptable latency)
RDBMSs
Tuesday, October 22, 13
Consistency
Availability Partition tolerance
“Pick any two”(and have acceptable latency)
RDBMSs Immediate Consistency: HBase, ...
Tuesday, October 22, 13
Consistency
Availability Partition tolerance
“Pick any two”(and have acceptable latency)
RDBMSs Immediate Consistency: HBase, ...
Eventual Consistency: Cassandra, Riak, ...
Tuesday, October 22, 13
Consistency
Availability Partition tolerance
“Pick any two”(and have acceptable latency)
RDBMSs Immediate Consistency: HBase, ...
Eventual Consistency: Cassandra, Riak, ...
+ Configurable (MongoDB, Cassandra - to
some extent, ...)
Tuesday, October 22, 13
OH REALLY?• Cassandra vs. Consistency:
http://aphyr.com/posts/294-call-me-maybe-cassandra
• CAP criticism:
http://aphyr.com/posts/292-call-me-maybe-nuodb
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
http://www.percona.com/live/mysql-conference-2013/sites/default/files/slides/aslett%20cap%20theorem.pdf
Tuesday, October 22, 13
KEY IDEAS
Tuesday, October 22, 13
•Dynamo partitioning + BigTable model
• simple architecture, minimal administration
•no single point of failure
•closer to the metal (e.g. no HDFS)
• low latency
Tuesday, October 22, 13
CASSANDRA’S DATA MODEL
Tuesday, October 22, 13
Keyspace
Column Family
Row (Partition) Key
Column Name
Value
Tuesday, October 22, 13
Keyspace
Column Family
Row (Partition) Key
Column Name
Value
“Database”
Tuesday, October 22, 13
Keyspace
Column Family
Row (Partition) Key
Column Name
Value
“Database”
“Table”
Tuesday, October 22, 13
Keyspace
Column Family
Row (Partition) Key
Column Name
Value
“Database”
“Table”
“Primary ID”
Tuesday, October 22, 13
Keyspace
Column Family
Row (Partition) Key
Column Name
Value
“Database”
“Table”
“Primary ID”
Sorted “Column”
Tuesday, October 22, 13
Keyspace
Column Family
Row (Partition) Key
Column Name
Value
“Database”
“Table”
“Primary ID”
Sorted “Column”
“Value”
Tuesday, October 22, 13
PARTITIONING
Tuesday, October 22, 13
TWO PARTITIONERS OUT OF THE BOX
• Byte Ordered Partitioner
• Random Partitioner
http://www.datastax.com/docs/1.0/cluster_architecture/partitioning
Tuesday, October 22, 13
TWO PARTITIONERS OUT OF THE BOX
• Byte Ordered Partitioner
• Random Partitioner
Forget it:•hot spots•uneven distribution•load balancing
http://www.datastax.com/docs/1.0/cluster_architecture/partitioning
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
Initial token
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
Initial token
Range: [aaa,bbb)
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
Range: [aaa,bbb)
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
Tuesday, October 22, 13
Row Key Hash ...
abc ...
klm ...
xyz ... 1aaa
2bbb
4zzz
3xxx
Tuesday, October 22, 13
Row Key Hash ...
abc ...
klm ...
xyz ... 1aaa
2bbb
4zzz
3xxx
abc
Tuesday, October 22, 13
Row Key Hash ...
abc ...
klm ...
xyz ... 1aaa
2bbb
4zzz
3xxx
abc
klm
Tuesday, October 22, 13
Row Key Hash ...
abc ...
klm ...
xyz ... 1aaa
2bbb
4zzz
3xxx
abc
klmxyz
Tuesday, October 22, 13
WHAT ABOUT THE REPLICATION!?
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
klmxyz
Replication Factor = 2
abc
Warning: greatly simplified.
Checkout snitch docs for more
info.
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
klmxyz
Replication Factor = 2
abc
abc
Warning: greatly simplified.
Checkout snitch docs for more
info.
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
klmxyz
Replication Factor = 2
abcklm
abc
Warning: greatly simplified.
Checkout snitch docs for more
info.
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
klmxyz
Replication Factor = 2
xyz
abcklm
abc
Warning: greatly simplified.
Checkout snitch docs for more
info.
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
abc
klmxyz
Replication Factor = 3
xyz
abcklm
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
abc
klmxyz
Replication Factor = 3
xyz
abcklm
abc
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
abc
klmxyz
Replication Factor = 3
xyz
abcklm
abc
klm
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
abc
klmxyz
Replication Factor = 3
xyz
abcklm
abc
klm xyz
Tuesday, October 22, 13
1aaa
2bbb
4zzz
3xxx
abc
klmxyz
Replication Factor = 3
xyz
abcklm
abc
klm xyzBTW,
QUORUM = (RF/2)+1
Tuesday, October 22, 13
Tuesday, October 22, 13
WHAT HAPPENS WHEN A NEW NODE IS BEING ADDED ?
1aaa
2bbb
4zzz
3xxx
5???
Tuesday, October 22, 13
VNODES1
aaacccggg
2bbbvvv
mmm
3xxxuuujjj
4zzzeeeddd
Tuesday, October 22, 13
VNODES1
aaacccggg
2bbbvvv
mmm
3xxxuuujjj
4zzzeeeddd
5
Tuesday, October 22, 13
1aaaccc
2bbbvvv
3xxxuuujjj
4zzzeeeddd
5ggg
mmm
Tuesday, October 22, 13
1aaaccc
2bbbvvv
3xxxuuujjj
4zzzeeeddd
5ggg
mmm
This also greatly helps in case when a node is down.
Tuesday, October 22, 13
CASSANDRA 101
Tuesday, October 22, 13
INSTALLATION & CONFIGURATION
Tuesday, October 22, 13
CQL3
SELECT * FROM books;
INSERT INTO
books (author, title, year) VALUES
(‘Moby-Dick’, ‘Herman Melville’, 1851);
DELETE FROM books WHERE author=‘Paulo Coelho’;
Tuesday, October 22, 13
DATA MODELING PRACTICES
COMPOSITE COLUMNS
Tuesday, October 22, 13
Author Book Year Number of words
George Orwell Animal Farm 1945 32451
George Orwell 1984 1949 110581
James Joyce Ulysses 1922 265192
Tuesday, October 22, 13
Author Book Year Number of words
George Orwell Animal Farm 1945 32451
George Orwell 1984 1949 110581
James Joyce Ulysses 1922 265192
CREATE TABLE books (author varchar,title varchar,year integer,number_of_words integer,PRIMARY KEY (author, title)
);
Tuesday, October 22, 13
Author Book Year Number of words
George Orwell Animal Farm 1945 32451
George Orwell 1984 1949 110581
James Joyce Ulysses 1922 265192
George Orwell [1984, Year]: 1949 [1984, Number of words]: 110581
[Animal Farm, Year]: 1945
[Animal Farm, Number of words]: 32451
James Joyce [Ulysses, Year]: 1922 [Ulysses, Number of words]: 265192
CREATE TABLE books (author varchar,title varchar,year integer,number_of_words integer,PRIMARY KEY (author, title)
);
Tuesday, October 22, 13
COUNTERS
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011
Tuesday, October 22, 13
SETS, LISTS, MAPS
Tuesday, October 22, 13
CONSOLE TIME
Tuesday, October 22, 13
WHAT WE HIT?
Tuesday, October 22, 13
•no DESCRIBE when calling from a client
•cache settings
• insertion performance with 100 000’s of columns
•PRIMARY KEY((a,b,c),d)
•compaction settings
Tuesday, October 22, 13
I WANT TO KNOW MORE
Tuesday, October 22, 13
• http://wiki.apache.org/cassandra/ArchitectureOverview
• http://www.datastax.com/documentation/cql/3.0/webhelp/index.html
• http://cassandra.apache.org/doc/cql3/CQL.html
• http://www.slideshare.net/acunu/freakin-fast-cassandra
• http://nosql.mypopescu.com/
• http://planetcassandra.org/
Tuesday, October 22, 13
BY THE WAY...
Tuesday, October 22, 13
HAVE YOU SEEN HIM?
getb
ase.c
om
getb
ase.c
om
getb
ase.c
om
getb
ase.c
om
getb
ase.c
om
getb
ase.c
om
getb
ase.c
om
getb
ase.c
om
getb
ase.c
om
getb
ase.c
om
getb
ase.c
om
getb
ase.c
omTuesday, October 22, 13