Wide Partitions Data Modeling - percona.com
Transcript of Wide Partitions Data Modeling - percona.com
![Page 1: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/1.jpg)
![Page 2: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/2.jpg)
Wide Partitions Data Modeling
![Page 4: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/4.jpg)
4
About ScyllaDB
+ The Real-Time Big Data Database
+ Drop-in replacement for Apache Cassandra and Amazon DynamoDB
+ 10X the performance & low tail latency
+ Open Source, Enterprise and Cloud options
+ Founded by the creators of KVM hypervisor
+ HQs: Palo Alto, CA, USA; Herzelia, Israel; Warsaw, Poland
About ScyllaDB
We are hiring!
![Page 5: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/5.jpg)
5
NoSQL Intro
Wide Partition Data Modeling
Choosing the right Partition Key
Choosing the right Clustering Key
Using Materialized Views / Secondary Index
Summary
Agenda
![Page 6: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/6.jpg)
1.NoSQL in 5 mins
![Page 7: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/7.jpg)
NoSQL - By Availability vs Consistency
7
Pick Two
Availability
Partition ToleranceConsistency
Or use a more granular model, like PACELC
![Page 8: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/8.jpg)
NoSQL - By Data Model
Key / Value Redis, Aerospike, RocksDB
Document store MongoDB, Couchbase
Wide column store Scylla, Apache Cassandra,HBase, DynamoDB
Graph Neo4j, JanusGraph
Com
plex
ity
8
![Page 9: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/9.jpg)
Cluster - Node Ring
9
![Page 10: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/10.jpg)
Scylla Architecture
Active/active, replicated, auto-sharded
10
![Page 11: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/11.jpg)
2.Data Modeling
11
![Page 12: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/12.jpg)
12
ApplicationModelData
ModelDataApplication
Relational:
NoSQL:
![Page 13: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/13.jpg)
13
UpdateUpdate schema / application
MeasureUse Metric to evaluate the data model
Data ModelOptimize the model base on Apps SLA
Application Estimate Read and Write pattern
Circle of (NoSQL) Life
![Page 14: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/14.jpg)
Partition Key
14
![Page 15: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/15.jpg)
Structure of data in Scylla
● Data is stored in MemTables and SSTables
● The highest level are partitions, identified by a partition key
● Partitions contain rows, identified by a clustering key
● Data is sorted in the partition by the clustering key, helping the storage engine to store data efficiently for querying
15
![Page 16: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/16.jpg)
Key / Value ExampleCREATE TABLE pet_owner ( pet_chip_id uuid, owner uuid, pet_name text, PRIMARY KEY (pet_chip_id));
Partition Key pet_chip_id owner pet_name80d39c78-9dc0-11eb-a8b3-0242ac130003 642adfee-6ad9-... Buddy80d39c78-9dc0-11eb-a8b3-0242ac130003 642adfee-6ad9-... Rocky80d39c78-9dc0-11eb-a8b3-0242ac130003 642adfee-6ad9-... Cat
... ... ...
![Page 17: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/17.jpg)
Choosing a Partition Key● High Cardinality ● Even Distribution
Avoid
● Low Cardinality● Hot Partition● Large Partition
17https://www.codedrome.com/zipfs-law-in-python/
![Page 18: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/18.jpg)
Key / Value ExampleCREATE TABLE pet_owner ( pet_chip_id uuid, owner uuid, pet_name text, PRIMARY KEY (pet_chip_id));
Partition Key pet_chip_id owner pet_name80d39c78-9dc0-11eb-a8b3-0242ac130003 642adfee-6ad9-... Buddy80d39c78-9dc0-11eb-a8b3-0242ac130003 642adfee-6ad9-... Rocky80d39c78-9dc0-11eb-a8b3-0242ac130003 642adfee-6ad9-... Cat
... ... ...
![Page 19: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/19.jpg)
19
![Page 20: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/20.jpg)
Choosing a Partition Key
20
● User Name● User ID● User ID + Time● Sensor ID ● Sensor ID + Time● Customer
● State● Age● Favorite NBA Team● Team Angel or Team Spike
https://commons.wikimedia.org/
![Page 21: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/21.jpg)
Hot (top) Partition
21
Why Amazon DynamoDB isn’t for everyone, Forrest Brazealhttps://acloudguru.com/blog/engineering/why-amazon-dynamodb-isnt-for-everyone-and-how-to-decide-when-it-s-for-you
Nodetool top partitions
![Page 22: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/22.jpg)
Wide Partition ExampleQuery:SELECT * from heartrate_v10 WHERE
pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 LIMIT 1;
SELECT * from heartrate_v10 WHERE
pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 AND
time >= '2021-05-01 01:00+0000' AND
time < '2021-05-01 01:03+0000';
22
https://gist.github.com/tzach/7486f1a0cc904c52f4514f20f14d2a97
![Page 23: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/23.jpg)
Wide Partition ExampleCREATE TABLE heartrate_v10 ( pet_chip_id uuid, owner uuid, time timestamp, heart_rate int, PRIMARY KEY (pet_chip_id, time));
pet_chip_id time heart_rate
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:00:00.000000+0000 120
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:01:00.000000+0000 121
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:02:00.000000+0000 120
Partition Key Clustering Key
![Page 24: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/24.jpg)
24
Wide Partition ExampleLarge Partition?
![Page 25: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/25.jpg)
Choosing a Clustering Key
25
● Allow useful range queries● Allow useful LIMIT
https://commons.wikimedia.org/
![Page 26: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/26.jpg)
26
SELECT * from heartrate_v10 WHERE
pet_chip_id = 80d39c78-9dc0-11eb-a8b3-0242ac130003 LIMIT 1;
CREATE TABLE heartrate_v5 (
pet_chip_id uuid,
time timestamp,
heart_rate int,
PRIMARY KEY (pet_chip_id, time)
) WITH CLUSTERING ORDER BY (time DESC);
Partition Key Clustering Key
![Page 27: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/27.jpg)
27
CREATE TABLE heartrate_v6 (
pet_chip_id uuid,
date text,
time timestamp,
heart_rate int,
PRIMARY KEY ((pet_chip_id, date), time));
Partition Key Clustering Key
Too Wide Partition ?
![Page 28: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/28.jpg)
3.Materialized Views and Secondary Index
28
![Page 29: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/29.jpg)
Example
CREATE TABLE heartrate_v10 ( pet_chip_id uuid, owner uuid, time timestamp, heart_rate int, PRIMARY KEY (pet_chip_id, time));
https://gist.github.com/tzach/7486f1a0cc904c52f4514f20f14d2a97
29
Partition KeyClustering Key
![Page 30: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/30.jpg)
Example
SELECT * FROM heartrate_v10 WHERE pet_chip_id = a2a60505-3e17-4ad4-8e1a-f11139caa1cc;
SELECT * FROM heartrate_v10 WHERE owner = 642adfee-6ad9-4ca5-aa32-a72e506b8ad8;
SELECT * FROM heartrate_v10 WHERE owner = 642adfee-6ad9-4ca5-aa32-a72e506b8ad8 ALLOW FILTERING;
30https://gist.github.com/tzach/4b9dadbc6e8a9c50369da05631c5e13e
![Page 31: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/31.jpg)
CQL syntax
CREATE MATERIALIZED VIEW heartrate_by_owner AS SELECT * FROM heartrate_v10 WHERE owner IS NOT NULL AND pet_chip_id IS NOT NULL AND time IS NOT NULL PRIMARY KEY(owner, pet_chip_id, time);
SELECT * FROM heartrate_by_owner WHERE owner = 642adfee-6ad9-4ca5-aa32-a72e506b8ad8;
https://docs.scylladb.com/getting-started/mv/ 31
![Page 32: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/32.jpg)
Example
32
pet_chip_id time owner heart_rate80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:00:00.000000+0000
642adfee-6ad9-4ca5-aa32-a72e506b8ad8 120
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:01:00.000000+0000
642adfee-6ad9-4ca5-aa32-a72e506b8ad8 121
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:02:00.000000+0000
642adfee-6ad9-4ca5-aa32-a72e506b8ad8 120
owner pet_chip_id time heart_rate642adfee-6ad9-4ca5-aa32-a72e506b8ad8
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:00:00.000000+0000 120
642adfee-6ad9-4ca5-aa32-a72e506b8ad8
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:01:00.000000+0000 121
642adfee-6ad9-4ca5-aa32-a72e506b8ad8
80d39c78-9dc0-11eb-a8b3-0242ac130003 2021-05-01 01:02:00.000000+0000 120
Base Table
ViewCREATE MATERIALIZED VIEW
![Page 33: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/33.jpg)
1. INSERT INTO heartrate (pet_chip_id,Owner,Time,heart_rate) VALUES (..);
View is another table
2. INSERT INTO heartrate
Base replica
View replicaCoordinator
3. INSERT INTO heartrate_by_owner
33
![Page 34: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/34.jpg)
View is another table
2.SELECT * FROM heartrate_by_owner WHERE owner = ‘642a..’;
Base replica
View replicaCoordinator
1.SELECT * FROM heartrate_by_owner WHERE owner = ‘642a..’;
34
![Page 35: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/35.jpg)
Choosing a MV Index Key
● High Cardinality ● Even Distribution
Avoid
● Low Cardinality● Hot Partition● Large Partition
35https://www.codedrome.com/zipfs-law-in-python/
![Page 36: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/36.jpg)
Wait, what about JOINs?
36
- Try to avoid them using denormalization- For ad-hoc operations - use external engine: Spark, Presto
![Page 37: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/37.jpg)
37
UpdateUpdate schema / application
MeasureUse Metric to evaluate the data model
Data ModelOptimize the model base on Apps SLA
Application Estimate Read and Write pattern
Circle of (NoSQL) Life
![Page 38: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/38.jpg)
Download Scylla Open Source:scylladb.com/download
Learn more https://university.scylladb.com/Contact me [email protected]
The Speaker’s camera displays
here
Experience Scylla for Yourself
38
![Page 39: Wide Partitions Data Modeling - percona.com](https://reader033.fdocuments.net/reader033/viewer/2022051307/6279d23fc99db2052015748a/html5/thumbnails/39.jpg)
39