HRX Meetup Group 8/20/2014: Cassandra and How to Scale your Database
-
Upload
planet-cassandra -
Category
Technology
-
view
713 -
download
2
description
Transcript of HRX Meetup Group 8/20/2014: Cassandra and How to Scale your Database
CassandraPretty Cool
HistoryGoogle Big Table
Amazon Dynamo
Today
Why Should You Care● Horizontal Scaling (basically auto sharding)
● Multiple Nodes - Highly Available
● Really Fast Writes
● Not too shabby at reads either - SLICES!!
● Bright Future
The Cluster
● replication factor (rf)● read consistency (r)● write consistency (w)● clustering - shard on
partition key
The One Ring
Storage - Vnodes
Data Model
● Wide rows
● Slices Queries
● Denormalization
● Index tables
CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY(user_id));
Data Model - Simple Key
ROW KEY
Data Model - Simple InsertsINSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘[email protected]‘, ‘[email protected]‘, ‘at my place’);
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘999’, ‘wat‘, ‘[email protected]‘, ‘[email protected]‘, ‘is going on?’);
Data Model Simple Inserts Result
Select * from email_app.emails;
111subject to_add cc body
wat horse@ giraffe@ is going on999
subject to_add cc body
party cat@ hippo@ at my place
Mental Model - Nested Hash
111
to cc bodyColumn Values
Row Keys 999
subject to cc bodysubject
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party’, ‘[email protected]‘, ‘[email protected]‘, ‘at my place’);
Data Model - Simple Insert - Again
111 subject to_add cc body
party cat@ hippo@ at my place
subject to_add cc body
wat horse@ giraffe@ Is going on?999IDEMPOTENT
CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY(user_id, subject));
Data Model - Composite Key 1
ROW KEY CLUSTERING KEY
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘at my place’);
Data Model - Composite Insert 1
Same as Before. Right???
Data Model Composite Insert Result
Select * from emails WHERE user_id = 111;
111 party|to_ad party|cc party|body
cat@ hippo@ At my place
Subject
Mental Model - Nested Hash
111
to_add cc bodyColumn Values
Row Key
partyClustering Column
user_id
subject
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ’swim’, ‘[email protected]‘, ‘[email protected]‘, ‘in the pool’);
Data Model - Composite Insert 2
Composite Insert 2 Result
Select * from emails WHERE user_id = ‘111’;
111 party|to_add party|cc party|body
cat@ hippo@ at my place
Subject
swim|to_add swim|cc swim|body
cat@ hippo@b in the pool
Sorted by clustering column - “subject”
Mental Model - Nested Sorted Hash
111
party
to cc body
Clustering Column
Column Values
Row Key
swim
to cc body
subject
user_id
Why sorted?
SELECT * FROM emails WHERE user_id = '111' AND (subject) >= ('s') AND (subject) < (‘t’);
111 party|to_add party|cc party|body
cat@ giraffe@ At my place
SLICE QUERIES!!
swim|to_add swim|cc swim|body
cat@ hippo@b in the pool
CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY((user_id, subject), to_add));
DM - Compound Composite Key
ROW KEY CLUSTERING KEY
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘wat‘, ‘[email protected]‘, ‘[email protected]‘, ‘is going on?’);
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘at my place’);
Composite / Compound Inserts
Composite Insert 2 Result
SELECT * FROM emails WHERE user_id = ‘111’AND subject = ‘party’;
111:partycat@|cc cat@|body
hippo@ At my place
SELECT * FROM emails WHERE user_id = ‘111’;
to_add
Data Model - Composite Insert 1
SELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’;
111:partycat@|cc cat@...|body
giraffe@ At my place
dog@|cc dog@|body
hippo@b all the time
Sorting / slice on - “to_add”
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘all the time’);
to_add
CREATE TABLE email_app.emails ( user_id text, subject text, to_add text, cc text, body text, PRIMARY KEY((user_id, subject), to_add, cc));
DM - Compound Composite Key 2
ROW KEY CLUSTERING KEYS
Composite / Clustered InsertsINSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘all the time);
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘At my place’);
INSERT INTO email_app.emails (user_id, subject, to_add, cc, body) VALUES (‘111’, ‘party‘, ‘[email protected]‘, ‘[email protected]‘, ‘At my place’);
DM - Composite / Clustered InsertsSELECT * FROM emails WHERE user_id = ‘111’ AND subject = ‘party’;
111|partycat@|hippo@|body cat@|mouse@|body
at my place at my place
dog@|hippo@|body
all the time
Slice on (to_add) OR (to_add, cc)
Mental Model - Nested Sorted Hash
111|party
cat dog
hippo mouse hippo
body body body
Clustering Columns
Column Values
Row Key
to_add
cc
user_id +subject
Part 2 / 8 of this 7 hour talk
● Denormalization
● Index Column Families
● Cassandra Internals (memtables, SSTables, compaction, repair)
Part 8 / 8: The Future
● Continually improving● More and more adoption● Awesome projects● http://www.datastax.
com/documentation/cassandra/2.0/pdf/cassandra20.pdf
● http://planetcassandra.org/