CQL3 and Data Modeling 101 with Apache Cassandra
-
Upload
chris-mceniry -
Category
Technology
-
view
380 -
download
3
description
Transcript of CQL3 and Data Modeling 101 with Apache Cassandra
CQL3 & Data Modeling 101
with Apache Cassandra
San Diego Cassandra Meetup Feb 27, 2014
Not how you’ve done it before…
In the beginning…There was
the Row, and
the Column
And the Row was fast to find and scale,
And the Column was fast to order.
Cassandra Properties
C*• Column Oriented
• Log Structured
• Distributed Database
Column Oriented• Columns actually hold the
data
• Key/Value pair
• Name can be used to store meaning as well
Distributed Database• Rows are used to distribute
• C* pulls the entire row into memory
• Can pull out individual parts or write to individual parts, but it’s still considered together
Log Structured Updates• Commitlog and sstables are
log structured
• Oriented around appending (streaming at a know location)
• ==> Writes quickly
• And you want to avoid rewrites
Random Reads• Data is scattered around the
store (have to get location and random read to look it up)
• Some indexing, and hopefully it’s in the vfs page cache, but still.
• ==> Reads “slower”
General Rules of Thumb• De-normalize Everything
• Duplicate your data
• Organize it for reading
Containers
Keyspace• For modeling, not much
• All data lives inside of Keyspaces
Columnfamily• aka Table
• Grouping of similar data
• Has unique key/row space
• Where some structure is applied
Row• Unique inside of a column
family
• Key/Value where the Value is all of the columns in the row
• Can handle some additional meaning to the row name
• Typically “bucketing”
Column• Holds data values
• Key/Value
• Can have meaning in the column name as well
Thrift Interface• Operates on the raw rows and
columns
• Many different language drivers
• Can use cassandra-cli to do this on the command line
Data Patterns
Users CFmac@mac
.comNAME TWITTER TAGS
mac @macmceniry admin,super,cool
jsmith@mac .com
NAME ICQ Employer Hobby
John 89403270 Smithco Miniature Horses
bb@example .com
NAME IRC
Bobtholomew bb@DAL
liz@example .com
NAME TWITTER Food TAGS
Elizabeth @liz Cheesecake admin
steve@my .net
NAME IRC
Steven steve@DARK
Lookup by chat handle
89403270
bb@DAL
bb@example .com
steve@ DARK
steve@my .net
ICQ CF IRC CF
Lookup by chat handle
89403270
EMAIL Name
[email protected] John
bb@DAL
EMAIL Name
bb@example .com Bobtholomew
steve@ DARK
EMAIL Name
steve@my .net Steve
ICQ CF IRC CF
HandleCF
89403270EMAIL NAME
[email protected] John
bb@DALEMAIL NAME
[email protected] Bobtholomew
steve@DARKEMAIL NAME
[email protected] Steven
HandleCF
89403270TYPE EMAIL NAME
ICQ [email protected] John
bb@DALTYPE EMAIL NAME
IRC [email protected] Bobtholomew
steve@DARKTYPE EMAIL NAME
IRC [email protected] Steven
How do I create these?[default@userdb] create column family usersCF; 5ecec19a-3a43-3490-8c9a-3eb2901e2e97 Waiting for schema agreement... ... schemas agree across the cluster [default@userdb] create column family handleCF; df82135c-eb1f-3abf-b9df-02c605d571d5 Waiting for schema agreement... ... schemas agree across the cluster
How do I insert data?[default@userdb] set handleCF[utf8(‘bb@DAL’)] … [utf8(‘NAME’)] = utf8('Bobtholomew'); Value inserted. Elapsed time: 22 msec(s). [default@userdb] set handleCF[utf8(‘bb@DAL')] … [utf8(‘EMAIL’)] = utf8(‘[email protected]’); Value inserted. Elapsed time: 3.43 msec(s).
Users CF - TAGSmac@mac
.comNAME TWITTER TAGS
mac @macmceniry admin,super,cool
jsmith@mac .com
NAME ICQ Employer Hobby
John 89403270 Smithco Miniature Horses
bb@example .com
NAME IRC
Bobtholomew bb@DAL
liz@example .com
NAME TWITTER Food TAGS
Elizabeth @liz Cheesecake admin
steve@my .net
NAME IRC
Steven steve@DARK
Users CF - TAGSmac@mac
.com
NAME TWITTER TAGS:admin TAGS:cool TAGS:super
mac @macmceniry
jsmith@mac .com
NAME ICQ Employer Hobby
John 89403270 Smithco Miniature Horses
bb@example .com
NAME IRC
Bobtholomew bb@DAL
liz@example .com
NAME TWITTER Food TAGS:admin
Elizabeth @liz Cheesecake
steve@my .net
NAME IRC
Steven steve@DARK
“Types”!
• Key Validator
• (Column) Comparator
• (Column Value) Default Validator, Metadata
• BytesType, AsciiType, UTF8Type, IntegerType, Int32Type, LongType, UUIDType, TimeUUIDType, DateType, BooleanType, FloatType, DoubleType, DecimalType, CounterColumnType (, CompositeTypes)
[default@userdb] set handleCF[utf8(‘bb@DAL’)]
What’s in a name?• Can use row names and
column names to add meaning
• Row name meaning creates a new distribution bin
• Column name meaning can create a data hierarchy
• No real change to the column family creation in the thrift interface (well, types depending on what you’re doing)
EventCFmac@mac
.com: 20140203
08:10:15 08:15:15 09:10:15
join update logout
mac@mac .com:
20140204
08:11:23 08:14:57 18:45:12 18:50:52 19:01:29
login logout login logout logout
mac@mac .com:
20140205
09:23:23 09:57:44
login logout
liz@example .com:
20140203
11:22:33 11:44:55 22:10:05 22:52:02
login logout login logout
liz@example .com:
20140205
08:11:23 08:14:57
login logout
That was then, this is…
Now• Same underlying structure -
none of that has changed
• Rows - reference quickly - use for searching
• Columns - scan quickly - user for ordering
• But now have usage patterns
• Some have been codified into CQL
CQL• Thrift alternative
• Simpler API
• Hides the structure of the internal storage
• 3 generations
• Only looking at CQL3 here
• cqlsh [-3]
How does handle look here?
cqlsh:userdb> CREATE TABLE handles ( … handlename VARCHAR, … email VARCHAR, … name VARCHAR, … PRIMARY KEY (handlename) … ); cqlsh:userdb> INSERT INTO handles … (handlename, email, name) VALUES … (‘bb@DAL’, ‘[email protected]’, ‘Bobtholomew’);
How does handle look here?
cqlsh:userdb> SELECT * FROM handles; handlename | email | name ————————————|————————————————|————————————— bb@DAL | [email protected] | Bobtholomew cqlsh:userdb> SELECT * FROM handles WHERE … handlename = ‘bb@DAL’; handlename | email | name ————————————|————————————————|————————————— bb@DAL | [email protected] | Bobtholomew
How do I change it?cqlsh:userdb> UPDATE handles SET email=‘none’ … WHERE handlename = ‘bb@DAL’; cqlsh:userdb> SELECT * FROM handles; handlename | email | name ————————————|————————————————|————————————— bb@DAL | none | Bobtholomew
upsert• Update instead of Insert
• Does the same thing (as long as it’s not a key)
• Insert instead of Update
• Overwrites data if it’s already there
What about our event buckets from earlier?
• Can do the same thing
• Creating a composite key
• USERNAME:DATE
• Creating a composite column
• hh:mm:ss
EventCFmac@mac
.com: 20140203
08:10:15 08:15:15 09:10:15
join update logout
mac@mac .com:
20140204
08:11:23 08:14:57 18:45:12 18:50:52 19:01:29
login logout login logout logout
mac@mac .com:
20140205
09:23:23 09:57:44
login logout
liz@example .com:
20140203
11:22:33 11:44:55 22:10:05 22:52:02
login logout login logout
liz@example .com:
20140205
08:11:23 08:14:57
login logout
cqlsh:userdb> CREATE TABLE events ( … username VARCHAR, … d VARCHAR, … hr INT, … min INT, … sec INT, … event VARCHAR, … PRIMARY KEY ( (username,d), hr, min, sec ) );
… PRIMARY KEY ( (username,d), hr, min, sec ) );
ROW NAME (C* 1.2) COLUMN NAME
Tags• CQL has collections
• map, list, set
• Collections are build similar to small/special composite columns
• Can add to our existing handle table
cqlsh:userdb> ALTER TABLE handles ADD tags SET; cqlsh:userdb> UPDATE TABLE handles … SET tags = (‘admin’, ‘foo’);
bb@DALemail name tags:admin tags:foo
[email protected] Bobtholomew
Design the data model so that it’s idempotent (eBay)
• Counter versus Collection (what question is being asked?)
Likes A Count100
Likes B Count200
Likes C Count300
Likes A +user11 +user12 -user11 +user131393287359 1393280912 1393281942 1393212345
Likes B +user12 1393287359 1393287100 1393287100 1393287100
Likes C +user11 +user12 +user13 +user141393287359 1393287100 1393287100 1393287100
Go Forth and Model!
Thank You!
PS… Sony Network is hiring!