Cassandra Day London 2015: Data Modeling 101
-
Upload
planet-cassandra -
Category
Technology
-
view
192 -
download
3
Transcript of Cassandra Day London 2015: Data Modeling 101
![Page 1: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/1.jpg)
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadinChief Evangelist for Apache Cassandra
Introduction to Data Modeling
1
![Page 2: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/2.jpg)
My Background
…ran into this problem
![Page 3: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/3.jpg)
Gave it my best shot
shard 1 shard 2 shard 3 shard 4
router
client
Patrick,All your wildest
dreams will come true.
![Page 4: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/4.jpg)
Just add complexity!
![Page 5: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/5.jpg)
A new plan
![Page 6: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/6.jpg)
ACID vs CAPACID
CAP - Pick two
Atomic - All or none Consistency - Only valid data is written Isolation - One operation at a time Durability - Once committed, it stays that way
Consistency - All data on cluster Availability - Cluster always accepts writes Partition tolerance - Nodes in cluster can’t talk to each other
Cassandra let’s you tune this
![Page 7: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/7.jpg)
Relational Data Models• 5 normal forms • Foreign Keys • Joins
deptId First Last1 Edgar Codd2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
Department
![Page 8: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/8.jpg)
Relational Modeling
Data
Models
Application
![Page 9: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/9.jpg)
Cassandra Modeling
Data
Models
Application
![Page 10: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/10.jpg)
CQL vs SQL•No joins •No aggregations
deptId First Last1 Edgar Codd2 Raymond Boyce
id Dept
1 Engineering
2 Math
Employees
DepartmentSELECT e.First, e.Last, d.DeptFROM Department d, Employees eWHERE ‘Codd’ = e.LastAND e.deptId = d.id
![Page 11: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/11.jpg)
Denormalization• Combine table columns into a single view •No joins
SELECT First, Last, Dept FROM employees WHERE id = ‘1’
id First Last Dept
1 Edgar Codd Engineering
2 Raymond Boyce Math
Employees
![Page 12: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/12.jpg)
No more sequences• Great for auto-creation of Ids • Guaranteed unique •Needs ACID to work. (Sorry. No sharding)
INSERT INTO user (id, firstName, LastName)VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)
![Page 13: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/13.jpg)
No sequences???• Almost impossible in a distributed system • Couple of great choices • Natural Key - Unique values like email • Surrogate Key - UUID
• Universal Unique ID • 128 bit number represented in character form • Easily generated on the client • Same as GUID for the MS folks
99051fe9-6a9c-46c2-b949-38ef78858dd0
![Page 14: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/14.jpg)
KillrVideo.com•Hosted on Azure • Code on GitHub • Also on your USB • Data Model for examples
![Page 15: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/15.jpg)
Entity Table• Simple view of a single
user • UUID used for ID • Simple primary key
// Users keyed by idCREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid));
SELECT firstname, lastnameFROM userWHERE userId = 99051fe9-6a9c-46c2-b949-38ef78858dd0
![Page 16: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/16.jpg)
CQL Collections
![Page 17: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/17.jpg)
CQL Collections•Meant to be dynamic part of table • Update syntax is very different from insert • Reads require all of collection to be read
![Page 18: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/18.jpg)
CQL Set• Set is sorted by CQL type comparator
INSERT INTO collections_example (id, set_example)VALUES(1, {'1-one', '2-two'});
set_example set<text>
Collection name Collection type CQL Type
![Page 19: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/19.jpg)
CQL Set Operations• Adding an element to the set
• After adding this element, it will sort to the beginning.
• Removing an element from the set
UPDATE collections_exampleSET set_example = set_example + {'3-three'} WHERE id = 1;
UPDATE collections_exampleSET set_example = set_example + {'0-zero'} WHERE id = 1;
UPDATE collections_exampleSET set_example = set_example - {'3-three'} WHERE id = 1;
![Page 20: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/20.jpg)
CQL List• Ordered by insertion • Use with caution
list_example list<text>
Collection name Collection type
INSERT INTO collections_example (id, list_example)VALUES(1, ['1-one', '2-two']);
CQL Type
![Page 21: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/21.jpg)
CQL List Operations• Adding an element to the end of a list
• Adding an element to the beginning of a list
• Deleting an element from a list
UPDATE collections_exampleSET list_example = list_example + ['3-three'] WHERE id = 1;
UPDATE collections_exampleSET list_example = ['0-zero'] + list_example WHERE id = 1;
UPDATE collections_exampleSET list_example = list_example - ['3-three'] WHERE id = 1;
![Page 22: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/22.jpg)
CQL Map• Key and value • Key is sorted by CQL type comparator
INSERT INTO collections_example (id, map_example)VALUES(1, { 1 : 'one', 2 : 'two' });
map_example map<int,text>
Collection name Collection type Value CQL TypeKey CQL Type
![Page 23: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/23.jpg)
CQL Map Operations• Add an element to the map
• Update an existing element in the map
• Delete an element in the map
UPDATE collections_example SET map_example[3] = 'three' WHERE id = 1;
UPDATE collections_example SET map_example[3] = 'tres' WHERE id = 1;
DELETE map_example[3] FROM collections_example WHERE id = 1;
![Page 24: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/24.jpg)
Entity with collections• Same type of entity • SET type for dynamic data • tags for each video
// Videos by idCREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid));
![Page 25: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/25.jpg)
Index (or lookup) tables• Table arranged to find data • Denormalized for speed • Find videos for a user
// One-to-many from user point of view (lookup table)CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid)) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
![Page 26: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/26.jpg)
Primary Key• First column name is the Partition Key • Subsequent are the Clustering Columns • Videos will be ordered by added_date and
// One-to-many from user point of view (lookup table)CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid)) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
![Page 27: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/27.jpg)
Primary key relationship
PRIMARY KEY (userId,added_date,videoId)
![Page 28: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/28.jpg)
Primary key relationship
Partition Key
PRIMARY KEY (userId,added_date,videoId)
![Page 29: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/29.jpg)
Primary key relationship
Partition Key Clustering Columns
PRIMARY KEY (userId,added_date,videoId)
![Page 30: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/30.jpg)
Primary key relationship
Partition Key Clustering Columns
A12378E55F5A32
PRIMARY KEY (userId,added_date,videoId)
![Page 31: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/31.jpg)
2005:12:1:102005:12:1:92005:12:1:82005:12:1:7
5F22A0BC
Primary key relationship
Partition Key Clustering Columns
F2B3652CFFB3652D7AB3652C
PRIMARY KEY (userId,added_date,videoId)
A12378E55F5A32
SELECT videoId FROM user_videos WHERE userId = A12378E55F5A32
AND added_date = ‘2005-12-1’
AND videoId = 5F22A0BC
![Page 32: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/32.jpg)
Clustering Order• Clustering Columns have default order • Use to specify order • Bonus: Sorts on disk for speed
// One-to-many from user point of view (lookup table)CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid)) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
![Page 33: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/33.jpg)
Multiple Lookups• Same data • Different lookup pattern // Index for tag keywords
CREATE TABLE videos_by_tag ( tag text, videoid uuid, added_date timestamp, name text, preview_image_location text, tagged_date timestamp, PRIMARY KEY (tag, videoid));
// Index for tags by first letter in the tagCREATE TABLE tags_by_letter ( first_letter text, tag text, PRIMARY KEY (first_letter, tag));
![Page 34: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/34.jpg)
Many to Many Relationships• Two views • Different directions • Insert data in a batch
// Comments for a given videoCREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid)) WITH CLUSTERING ORDER BY (commentid DESC);
// Comments for a given userCREATE TABLE comments_by_user ( userid uuid, commentid timeuuid, videoid uuid, comment text, PRIMARY KEY (userid, commentid)) WITH CLUSTERING ORDER BY (commentid DESC);
![Page 35: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/35.jpg)
Use Case Example
![Page 36: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/36.jpg)
Example 1: Weather Station•Weather station collects data • Cassandra stores in sequence • Application reads in sequence
![Page 37: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/37.jpg)
Use case
• Store data per weather station • Store time series in order: first to last
• Get all data for one weather station • Get data for a single date and time • Get data for a range of dates and times
Needed Queries
Data Model to support queries
![Page 38: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/38.jpg)
Data Model•Weather Station Id and Time
are unique • Store as many as needed
CREATE TABLE temperature ( weather_station text, year int, month int, day int, hour int, temperature double, PRIMARY KEY (weather_station,year,month,day,hour) );
INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,7,-5.6);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,8,-5.1);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,9,-4.9);
INSERT INTO temperature(weather_station,year,month,day,hour,temperature) VALUES (‘10010:99999’,2005,12,1,10,-5.3);
![Page 39: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/39.jpg)
Storage Model - Logical View
2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
SELECT weather_station,hour,temperature FROM temperature WHERE weatherstation_id='10010:99999';
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:10
-5.310010:99999
![Page 40: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/40.jpg)
2005:12:1:12
-5.4
2005:12:1:11
-4.9 -5.3-4.9-5.1
2005:12:1:7
-5.6
Storage Model - Disk Layout
2005:12:1:8 2005:12:1:910010:99999
2005:12:1:10
Merged, Sorted and Stored Sequentially
SELECT weather_station,hour,temperature FROM temperature WHERE weatherstation_id='10010:99999';
![Page 41: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/41.jpg)
Query patterns• Range queries • “Slice” operation on disk
SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
Single seek on disk
2005:12:1:12
-5.4
2005:12:1:11
-4.9 -5.3-4.9-5.1
2005:12:1:7
-5.6
2005:12:1:8 2005:12:1:910010:99999
2005:12:1:10
Partition key for locality
![Page 42: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/42.jpg)
Query patterns• Range queries • “Slice” operation on disk
Programmers like this
Sorted by event_time2005:12:1:7
-5.6
2005:12:1:8
-5.1
2005:12:1:9
-4.9
10010:99999
10010:99999
10010:99999
weather_station hour temperature
2005:12:1:10
-5.310010:99999
SELECT weatherstation,hour,temperature FROM temperature WHERE weatherstation=‘10010:99999' AND year = 2005 AND month = 12 AND day = 1 AND hour >= 7 AND hour <= 10;
![Page 43: Cassandra Day London 2015: Data Modeling 101](https://reader034.fdocuments.net/reader034/viewer/2022051213/55a68fcc1a28abd0378b45ed/html5/thumbnails/43.jpg)
Thank you!
Bring the questions
Follow me on twitter @PatrickMcFadin