CQL3 and Data Modeling 101 with Apache Cassandra

45
CQL3 & Data Modeling 101 with Apache Cassandra San Diego Cassandra Meetup Feb 27, 2014

description

Introduction to CQL3 and Cassandra Data Modeling. Presented at SD Cassandra Meetup 2014.02.27

Transcript of CQL3 and Data Modeling 101 with Apache Cassandra

Page 1: CQL3 and Data Modeling 101 with Apache Cassandra

CQL3 & Data Modeling 101

with Apache Cassandra

San Diego Cassandra Meetup Feb 27, 2014

Page 2: CQL3 and Data Modeling 101 with Apache Cassandra

Not how you’ve done it before…

Page 3: CQL3 and Data Modeling 101 with Apache Cassandra

In the beginning…There was

the Row, and

the Column

And the Row was fast to find and scale,

And the Column was fast to order.

Page 4: CQL3 and Data Modeling 101 with Apache Cassandra

Cassandra Properties

Page 5: CQL3 and Data Modeling 101 with Apache Cassandra

C*• Column Oriented

• Log Structured

• Distributed Database

Page 6: CQL3 and Data Modeling 101 with Apache Cassandra

Column Oriented• Columns actually hold the

data

• Key/Value pair

• Name can be used to store meaning as well

Page 7: CQL3 and Data Modeling 101 with Apache Cassandra

Distributed Database• Rows are used to distribute

• C* pulls the entire row into memory

• Can pull out individual parts or write to individual parts, but it’s still considered together

Page 8: CQL3 and Data Modeling 101 with Apache Cassandra

Log Structured Updates• Commitlog and sstables are

log structured

• Oriented around appending (streaming at a know location)

• ==> Writes quickly

• And you want to avoid rewrites

Page 9: CQL3 and Data Modeling 101 with Apache Cassandra

Random Reads• Data is scattered around the

store (have to get location and random read to look it up)

• Some indexing, and hopefully it’s in the vfs page cache, but still.

• ==> Reads “slower”

Page 10: CQL3 and Data Modeling 101 with Apache Cassandra

General Rules of Thumb• De-normalize Everything

• Duplicate your data

• Organize it for reading

Page 11: CQL3 and Data Modeling 101 with Apache Cassandra

Containers

Page 12: CQL3 and Data Modeling 101 with Apache Cassandra

Keyspace• For modeling, not much

• All data lives inside of Keyspaces

Page 13: CQL3 and Data Modeling 101 with Apache Cassandra

Columnfamily• aka Table

• Grouping of similar data

• Has unique key/row space

• Where some structure is applied

Page 14: CQL3 and Data Modeling 101 with Apache Cassandra

Row• Unique inside of a column

family

• Key/Value where the Value is all of the columns in the row

• Can handle some additional meaning to the row name

• Typically “bucketing”

Page 15: CQL3 and Data Modeling 101 with Apache Cassandra

Column• Holds data values

• Key/Value

• Can have meaning in the column name as well

Page 16: CQL3 and Data Modeling 101 with Apache Cassandra

Thrift Interface• Operates on the raw rows and

columns

• Many different language drivers

• Can use cassandra-cli to do this on the command line

Page 17: CQL3 and Data Modeling 101 with Apache Cassandra

Data Patterns

Page 18: CQL3 and Data Modeling 101 with Apache Cassandra

Users CFmac@mac

.comNAME TWITTER TAGS

mac @macmceniry admin,super,cool

jsmith@mac .com

NAME ICQ Employer Hobby

John 89403270 Smithco Miniature Horses

bb@example .com

NAME IRC

Bobtholomew bb@DAL

liz@example .com

NAME TWITTER Food TAGS

Elizabeth @liz Cheesecake admin

steve@my .net

NAME IRC

Steven steve@DARK

Page 19: CQL3 and Data Modeling 101 with Apache Cassandra

Lookup by chat handle

89403270

EMAIL

[email protected]

bb@DAL

EMAIL

bb@example .com

steve@ DARK

EMAIL

steve@my .net

ICQ CF IRC CF

Page 20: CQL3 and Data Modeling 101 with Apache Cassandra

Lookup by chat handle

89403270

EMAIL Name

[email protected] John

bb@DAL

EMAIL Name

bb@example .com Bobtholomew

steve@ DARK

EMAIL Name

steve@my .net Steve

ICQ CF IRC CF

Page 21: CQL3 and Data Modeling 101 with Apache Cassandra

HandleCF

89403270EMAIL NAME

[email protected] John

bb@DALEMAIL NAME

[email protected] Bobtholomew

steve@DARKEMAIL NAME

[email protected] Steven

Page 22: CQL3 and Data Modeling 101 with Apache Cassandra

HandleCF

89403270TYPE EMAIL NAME

ICQ [email protected] John

bb@DALTYPE EMAIL NAME

IRC [email protected] Bobtholomew

steve@DARKTYPE EMAIL NAME

IRC [email protected] Steven

Page 23: CQL3 and Data Modeling 101 with Apache Cassandra

How do I create these?[default@userdb] create column family usersCF; 5ecec19a-3a43-3490-8c9a-3eb2901e2e97 Waiting for schema agreement... ... schemas agree across the cluster [default@userdb] create column family handleCF; df82135c-eb1f-3abf-b9df-02c605d571d5 Waiting for schema agreement... ... schemas agree across the cluster

Page 24: CQL3 and Data Modeling 101 with Apache Cassandra

How do I insert data?[default@userdb] set handleCF[utf8(‘bb@DAL’)] … [utf8(‘NAME’)] = utf8('Bobtholomew'); Value inserted. Elapsed time: 22 msec(s). [default@userdb] set handleCF[utf8(‘bb@DAL')] … [utf8(‘EMAIL’)] = utf8(‘[email protected]’); Value inserted. Elapsed time: 3.43 msec(s).

Page 25: CQL3 and Data Modeling 101 with Apache Cassandra

Users CF - TAGSmac@mac

.comNAME TWITTER TAGS

mac @macmceniry admin,super,cool

jsmith@mac .com

NAME ICQ Employer Hobby

John 89403270 Smithco Miniature Horses

bb@example .com

NAME IRC

Bobtholomew bb@DAL

liz@example .com

NAME TWITTER Food TAGS

Elizabeth @liz Cheesecake admin

steve@my .net

NAME IRC

Steven steve@DARK

Page 26: CQL3 and Data Modeling 101 with Apache Cassandra

Users CF - TAGSmac@mac

.com

NAME TWITTER TAGS:admin TAGS:cool TAGS:super

mac @macmceniry

jsmith@mac .com

NAME ICQ Employer Hobby

John 89403270 Smithco Miniature Horses

bb@example .com

NAME IRC

Bobtholomew bb@DAL

liz@example .com

NAME TWITTER Food TAGS:admin

Elizabeth @liz Cheesecake

steve@my .net

NAME IRC

Steven steve@DARK

Page 27: CQL3 and Data Modeling 101 with Apache Cassandra

“Types”!

• Key Validator

• (Column) Comparator

• (Column Value) Default Validator, Metadata

• BytesType, AsciiType, UTF8Type, IntegerType, Int32Type, LongType, UUIDType, TimeUUIDType, DateType, BooleanType, FloatType, DoubleType, DecimalType, CounterColumnType (, CompositeTypes)

[default@userdb] set handleCF[utf8(‘bb@DAL’)]

Page 28: CQL3 and Data Modeling 101 with Apache Cassandra

What’s in a name?• Can use row names and

column names to add meaning

• Row name meaning creates a new distribution bin

• Column name meaning can create a data hierarchy

• No real change to the column family creation in the thrift interface (well, types depending on what you’re doing)

Page 29: CQL3 and Data Modeling 101 with Apache Cassandra

EventCFmac@mac

.com: 20140203

08:10:15 08:15:15 09:10:15

join update logout

mac@mac .com:

20140204

08:11:23 08:14:57 18:45:12 18:50:52 19:01:29

login logout login logout logout

mac@mac .com:

20140205

09:23:23 09:57:44

login logout

liz@example .com:

20140203

11:22:33 11:44:55 22:10:05 22:52:02

login logout login logout

liz@example .com:

20140205

08:11:23 08:14:57

login logout

Page 30: CQL3 and Data Modeling 101 with Apache Cassandra

That was then, this is…

Page 31: CQL3 and Data Modeling 101 with Apache Cassandra

Now• Same underlying structure -

none of that has changed

• Rows - reference quickly - use for searching

• Columns - scan quickly - user for ordering

• But now have usage patterns

• Some have been codified into CQL

Page 32: CQL3 and Data Modeling 101 with Apache Cassandra

CQL• Thrift alternative

• Simpler API

• Hides the structure of the internal storage

• 3 generations

• Only looking at CQL3 here

• cqlsh [-3]

Page 33: CQL3 and Data Modeling 101 with Apache Cassandra

How does handle look here?

cqlsh:userdb> CREATE TABLE handles ( … handlename VARCHAR, … email VARCHAR, … name VARCHAR, … PRIMARY KEY (handlename) … ); cqlsh:userdb> INSERT INTO handles … (handlename, email, name) VALUES … (‘bb@DAL’, ‘[email protected]’, ‘Bobtholomew’);

Page 34: CQL3 and Data Modeling 101 with Apache Cassandra

handles Table

bb@DALemail name

[email protected] Bobtholomew

Page 35: CQL3 and Data Modeling 101 with Apache Cassandra

How does handle look here?

cqlsh:userdb> SELECT * FROM handles; handlename | email | name ————————————|————————————————|————————————— bb@DAL | [email protected] | Bobtholomew cqlsh:userdb> SELECT * FROM handles WHERE … handlename = ‘bb@DAL’; handlename | email | name ————————————|————————————————|————————————— bb@DAL | [email protected] | Bobtholomew

Page 36: CQL3 and Data Modeling 101 with Apache Cassandra

How do I change it?cqlsh:userdb> UPDATE handles SET email=‘none’ … WHERE handlename = ‘bb@DAL’; cqlsh:userdb> SELECT * FROM handles; handlename | email | name ————————————|————————————————|————————————— bb@DAL | none | Bobtholomew

Page 37: CQL3 and Data Modeling 101 with Apache Cassandra

upsert• Update instead of Insert

• Does the same thing (as long as it’s not a key)

• Insert instead of Update

• Overwrites data if it’s already there

Page 38: CQL3 and Data Modeling 101 with Apache Cassandra

What about our event buckets from earlier?

• Can do the same thing

• Creating a composite key

• USERNAME:DATE

• Creating a composite column

• hh:mm:ss

Page 39: CQL3 and Data Modeling 101 with Apache Cassandra

EventCFmac@mac

.com: 20140203

08:10:15 08:15:15 09:10:15

join update logout

mac@mac .com:

20140204

08:11:23 08:14:57 18:45:12 18:50:52 19:01:29

login logout login logout logout

mac@mac .com:

20140205

09:23:23 09:57:44

login logout

liz@example .com:

20140203

11:22:33 11:44:55 22:10:05 22:52:02

login logout login logout

liz@example .com:

20140205

08:11:23 08:14:57

login logout

Page 40: CQL3 and Data Modeling 101 with Apache Cassandra

cqlsh:userdb> CREATE TABLE events ( … username VARCHAR, … d VARCHAR, … hr INT, … min INT, … sec INT, … event VARCHAR, … PRIMARY KEY ( (username,d), hr, min, sec ) );

Page 41: CQL3 and Data Modeling 101 with Apache Cassandra

… PRIMARY KEY ( (username,d), hr, min, sec ) );

ROW NAME (C* 1.2) COLUMN NAME

Page 42: CQL3 and Data Modeling 101 with Apache Cassandra

Tags• CQL has collections

• map, list, set

• Collections are build similar to small/special composite columns

• Can add to our existing handle table

Page 43: CQL3 and Data Modeling 101 with Apache Cassandra

cqlsh:userdb> ALTER TABLE handles ADD tags SET; cqlsh:userdb> UPDATE TABLE handles … SET tags = (‘admin’, ‘foo’);

bb@DALemail name tags:admin tags:foo

[email protected] Bobtholomew

Page 44: CQL3 and Data Modeling 101 with Apache Cassandra

Design the data model so that it’s idempotent (eBay)

• Counter versus Collection (what question is being asked?)

Likes A Count100

Likes B Count200

Likes C Count300

Likes A +user11 +user12 -user11 +user131393287359 1393280912 1393281942 1393212345

Likes B +user12 1393287359 1393287100 1393287100 1393287100

Likes C +user11 +user12 +user13 +user141393287359 1393287100 1393287100 1393287100

Page 45: CQL3 and Data Modeling 101 with Apache Cassandra

Go Forth and Model!

Thank You!

PS… Sony Network is hiring!