Real data models of silicon valley
-
Upload
patrick-mcfadin -
Category
Data & Analytics
-
view
1.075 -
download
1
description
Transcript of Real data models of silicon valley
Real Data Models of Silicon ValleyPatrick McFadin
Chief Evangelist for Apache Cassandra !
@PatrickMcFadin
It's been an epic year
I've had a ton of fun!
• Traveling the world talking to people like you!
Warsaw
Stockholm
Melbourne
New YorkVancouver
Dublin
What's new?• 2.1 is out!
• Amazing changes for performance and stability
Where are we going?
• 3.0 is next. Just hold on…
KillrVideo.com• 2012 Summit
• Complete example for data modeling
www.killrvideos.com
Video TitleRecommended
MeowAds
by Google
Comments
Description
Upload New!
Username
Rating: Tags: Foo Bar
*Cat drawing by goodrob13 on Flickr
It’s alive!!!• Hosted on Azure
• Code on Github
Data Model - Revisited• Add in some 2.1 data models
• Replace (or remove) some app code
• Become a part of Cassandra OSS download
User Defined Types• Complex data in one place
• No multi-gets (multi-partitions)
• Nesting! CREATE TYPE address ( street text, city text, zip_code int, country text, cross_streets set<text> );
BeforeCREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, added_date timestamp, PRIMARY KEY (videoid) );
CREATE TABLE video_metadata ( video_id uuid PRIMARY KEY, height int, width int, video_bit_rate set<text>, encoding text );
SELECT * FROM videos WHERE videoId = 2; !SELECT * FROM video_metadata WHERE videoId = 2;
Title: Introduction to Apache Cassandra !Description: A one hour talk on everything you need to know about a totally amazing database.
480 720
Playback rate:
In-application join
After• Now video_metadata is
embedded in videos
CREATE TYPE video_metadata ( height int, width int, video_bit_rate set<text>, encoding text );
CREATE TABLE videos ( videoid uuid, userid uuid, name varchar, description varchar, location text, location_type int, preview_thumbnails map<text,text>, tags set<varchar>, metadata set <frozen<video_metadata>>, added_date timestamp, PRIMARY KEY (videoid) );
Wait! Frozen??• Staying out of technical
debt
• 3.0 UDTs will not have to be frozen
• Applicable to User Defined Types and Tuples (wait for it…)
Do you want to build a schema? Do you want to store some JSON?
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
CREATE TYPE dimensions ( units text, length float, width float, height float );
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
CREATE TYPE dimensions ( units text, length float, width float, height float );
CREATE TYPE category ( catalogPage int, url text );
Let’s store some JSON{ "productId": 2, "name": "Kitchen Table", "price": 249.99, "description" : "Rectangular table with oak finish", "dimensions": { "units": "inches", "length": 50.0, "width": 66.0, "height": 32 }, "categories": { { "category" : "Home Furnishings" { "catalogPage": 45, "url": "/home/furnishings" }, { "category" : "Kitchen Furnishings" { "catalogPage": 108, "url": "/kitchen/furnishings" } } }
CREATE TYPE dimensions ( units text, length float, width float, height float );
CREATE TYPE category ( catalogPage int, url text );
CREATE TABLE product ( productId int, name text, price float, description text, dimensions frozen <dimensions>, categories map <text, frozen <category>>, PRIMARY KEY (productId) );
Let’s store some JSONINSERT INTO product (productId, name, price, description, dimensions, categories) VALUES (2, 'Kitchen Table', 249.99, 'Rectangular table with oak finish', { units: 'inches', length: 50.0, width: 66.0, height: 32 }, { 'Home Furnishings': { catalogPage: 45, url: '/home/furnishings' }, 'Kitchen Furnishings': { catalogPage: 108, url: '/kitchen/furnishings' } ! } );
dimensions frozen <dimensions>
categories map <text, frozen <category>>
Retrieving fields
Counters pt Deux
• Since .8
• Commit log replay would change counters
• Repair could change counters
• Performance was inconsistent. Lots of GC
The good• Stable under load
• No commit log replay issues
• No repair weirdness
The bad
• Still can’t delete/reset counters
• Still needs to do a read before write.
UsageWait for it…
It’s the same! Carry on…
Static Fields• New as of 2.0.6
• VERY specific, but useful
• Thrift people will like this
CREATE TABLE t ( k text, s text STATIC, i int, PRIMARY KEY (k, i) );
Why?CREATE TABLE weather ( id int, time timestamp, weatherstation_name text, temperature float, PRIMARY KEY (id, time) );
ID = 1Partition Key
(Storage Row Key)
2014-09-08 12:00:00 : name
SFO
2014-09-08 12:00:00 : temp
63.4
2014-09-08 12:01:00 : name
SFO
2014-09-08 12:00:00 : temp
63.9
2014-09-08 12:02:00 : name
SFO
2014-09-08 12:00:00 : temp
64.0
Partition Row 1 Partition Row 2 Partition Row 3
ID = 1Partition Key
(Storage Row Key)
name
SFO
2014-09-08 12:00:00 : temp
63.4
2014-09-08 12:00:00 : temp
63.9
2014-09-08 12:00:00 : temp
64.0
Partition Row 1 Partition Row 1 Partition Row 1
CREATE TABLE weather ( id int, time timestamp, weatherstation_name text static, temperature float, PRIMARY KEY (id, time) );
Usage• Put a static at the end of the declaration
• Can’t be a part of primary key
CREATE TABLE video_event ( videoid uuid, userid uuid, preview_image_location text static, event varchar, event_timestamp timeuuid, video_timestamp bigint, PRIMARY KEY ((videoid,userid),event_timestamp,event) ) WITH CLUSTERING ORDER BY (event_timestamp DESC,event ASC);
Tuples
• A type that represents a group
• Up to 256 different elements
CREATE TABLE tuple_table ( id int PRIMARY KEY, three_tuple frozen <tuple<int, text, float>>, four_tuple frozen <tuple<int, text, float, inet>>, five_tuple frozen <tuple<int, text, float, inet, ascii>> );
Example Usage• Track a drone’s position
• x, y, z in a 3D Cartesian
CREATE TABLE drone_position ( droneId int, time timestamp, position frozen <tuple<float, float, float>>, PRIMARY KEY (droneId, time) );
What about partition size?
• A CQL partition is a logical projection of a storage row
• Storage rows can have up to 2 billion cells
• Each cell can hold up to 2G of data
How much is too much?
• How many cells before performance degrades?
• How many bytes per partition before it’s unmanageable
• What is “practical”
Old answer• 2011: Pre-Cassandra 1.2 (actually tested on .8)
• Aaron Morton, Cassandra MVP and Founder of The Last Pickle
Conclusion• Keep partition (storage row) length < 10k cells
• Total size in bytes below 64M (Multi-pass compaction)
• Multiple hits to 64k page size will start to hurt
TL;DR - It’s a performance tunable
The tests revisited
• Attempted to reproduce the same tests using CQL
• Cassandra 2.1, 2.0 and 1.2
• Tested partitions sizes 1. 100 2. 2114 3. 5,000 4. 10,000 5. 100,000 6. 1,000,000 7. 10,000,000 8. 100,000,000 9. 1,000,000,000
Results
mSec
Cells per partition
The new answer
• 100’s of thousands is not problem
• 100’s of megs per partition is best operationally
• The issue to manage is operations
Thank You!
Follow me on twitter for more @PatrickMcFadin
CASSANDRASUMMIT2014September 10 - 11 | #CassandraSummit