Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

90
Introduction to CQL and Data Modeling Helsinki Cassandra Meetup 10 th February 2014 ©2014 DataStax Confidential. Do not distribute without consent.

description

- Introduction to CQL3 and DataModeling (Johnny Miller, Cassandra Solutions Architect, Datastax): Johnny Miller is an experience developer, architect, team lead and agile coach with a history of working at Sky, AOL Broadband and Alcatel-Lucent. Johnny has architected and delivered a number of platforms using Cassandra as a key component for achieving high availability and efficient scaling.

Transcript of Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Page 1: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Introduction to CQL and Data Modeling Helsinki Cassandra Meetup 10th February 2014

©2014 DataStax Confidential. Do not distribute without consent.

Page 2: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Agenda

•  Introduction •  CQL Basics •  Data Modeling •  Time Series/Sensor Data •  Java Driver

©2014 DataStax Confidential. Do not distribute without consent. 2

Page 3: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

About me

Johnny Miller DataStax Solutions Architect www.datastax.com @DataStax @CyanMiller https://www.linkedin.com/in/johnnymiller [email protected]

©2014 DataStax Confidential. Do not distribute without consent. 3

Page 4: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

DataStax

•  Founded in April 2010

•  We drive Apache Cassandra™

•  400+ customers (20 of the Fortune 100)

•  200+ employees

•  Home to Apache Cassandra Chair & most committers

•  Headquartered in San Francisco Bay area

•  European headquarters established in London

Our Goal To be the first and best database choice for online applications

©2014 DataStax Confidential. Do not distribute without consent. 4

Page 5: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

DataStax

•  DataStax supports both the open source community and enterprises.

©2014 DataStax Confidential. Do not distribute without consent. 5

Open Source/Community Enterprise Software

•  Apache Cassandra (employ Cassandra chair and 90+% of the committers)

•  DataStax Community Edition •  DataStax OpsCenter •  DataStax DevCenter •  DataStax Drivers/Connectors •  Online Documentation •  Online Training •  Mailing lists and forums

•  DataStax Enterprise Edition •  Certified Cassandra •  Built-in Analytics •  Built-in Enterprise Search •  Enterprise Security

•  DataStax OpsCenter •  Expert Support •  Consultative Help •  Professional Training

Page 6: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Cassandra Adoption

©2014 DataStax Confidential. Do not distribute without consent. 6

Source http://db-engines.com/en/ranking, Feb 2014 Source http://db-engines.com/en/ranking, Feb 2014

Source http://db-engines.com/en/ranking, Feb 2014

Page 7: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

A sample of Cassandra & DataStax Enterprise users

©2014 DataStax Confidential. Do not distribute without consent. 7

Page 8: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Why Good Data Modeling is Important

•  Cassandra is a highly available, highly scalable, & highly distributed database, with no single point of failure

•  To achieve this, Cassandra is optimized for non-relational data models. •  Joins do not function well on distributed databases. •  Locking and transactions jam up distributed nodes

•  By modeling data properly for Cassandra you can avoid joins, locking, and transactions for your application.

©2014 DataStax Confidential. Do not distribute without consent. 8

Page 9: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics YesCQL

©2014 DataStax Confidential. Do not distribute without consent. 9

Page 10: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics

•  Cassandra Query Language •  SQL–like language to query Cassandra •  Limited predicates. Attempts to prevent bad queries •  but, you can still get into trouble!

•  Keyspace – analogous to a schema. •  Has various storage attributes. •  The keyspace determines the RF.

•  Table – looks like a SQL Table. •  A table must have a Primary Key. •  We can fully qualify a table as <keyspace>.<table>

©2014 DataStax Confidential. Do not distribute without consent. 10

Page 11: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

DevCenter

•  DataStax DevCenter – a free, visual query tool for creating and running CQL statements against Cassandra and DataStax Enterprise.

©2014 DataStax Confidential. Do not distribute without consent. 11

Page 12: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics

•  Usual statements •  CREATE / DROP / ALTER TABLE • SELECT

BUT •  INSERT AND UPDATE are similar to each other •  If a row doesn’t exist, UPDATE will insert it, and if it exists, INSERT will

replace it. •  Think of it as an UPSERT •  Therefore we never get a key violation

•  For updates, Cassandra never reads

©2014 DataStax Confidential. Do not distribute without consent. 12

Page 13: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Creating a keyspace - Single Data Centre Consistency

©2014 DataStax Confidential. Do not distribute without consent. 13

Page 14: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Creating a keyspace - Multiple Data Centre Consistency

©2014 DataStax Confidential. Do not distribute without consent. 14

Page 15: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – creating a table CREATE TABLE cities (! city_name varchar,! elevation int,! population int,! latitude float,! longitude float,! PRIMARY KEY (city_name)!);!

•  We can visualize it this way:

•  city_name is the partition key

•  In this example, the partition key = primary key

©2014 DataStax Confidential. Do not distribute without consent. 15

Page 16: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Composite Primary Key

The Primary Key •  The key uniquely identifies a row. •  A composite primary key consists of: •  A partition key •  One or more clustering columns

e.g. PRIMARY KEY (partition key, cluster columns, ...)! •  The partition key determines on which node the partition resides •  Data is ordered in cluster column order within the partition

©2014 DataStax Confidential. Do not distribute without consent. 16

Page 17: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Composite Primary Key CREATE TABLE sporty_league (! team_name varchar,! player_name varchar,! jersey int,! PRIMARY KEY (team_name, player_name)!);!

©2014 DataStax Confidential. Do not distribute without consent. 17

Page 18: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Simple Select

SELECT * FROM sporty_league;! •  More that a few rows can be slow. (Limited to 10,000 rows by default) •  Use LIMIT keyword to choose fewer or more rows

©2014 DataStax Confidential. Do not distribute without consent. 18

Page 19: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics - Simple Select on Partition Key and Cluster Columns

SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’;!

SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’ and player_name = ‘Lucky’;!

©2014 DataStax Confidential. Do not distribute without consent. 19

Page 20: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Insert/Update

INSERT INTO sporty_league (team_name, player_name, jersey) VALUES ('Mighty Mutts',’Felix’,90);!

©2014 DataStax Confidential. Do not distribute without consent. 20

Page 21: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics - Ordering

•  Partition keys are not ordered, but the cluster columns are. •  However, you can only order by a column if it’s a cluster column. •  Data will returned by default in the order of the clustering column. •  You can also use the ORDER BY keyword – but only on the clustering

column!

SELECT * FROM sporty_league WHERE team_name = ‘Mighty Mutts’ ORDER BY player_name DESC;!

©2014 DataStax Confidential. Do not distribute without consent. 21

Page 22: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Group By

•  We have already done this! •  The partition key effectively names the columns for grouping. •  The previous table contained all of the players grouped by their

team_name.

©2014 DataStax Confidential. Do not distribute without consent. 22

Page 23: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics - Predicates

•  On the partition key: = and IN •  On the cluster columns: <, <=, =, >=, >, IN

©2014 DataStax Confidential. Do not distribute without consent. 23

Page 24: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Composite Partition Key CREATE TABLE cities (! city_name varchar,! state varchar! PRIMARY KEY ((city_name,state))!);!

•  Each city gets it own partition!

©2014 DataStax Confidential. Do not distribute without consent. 24

Page 25: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Performance considerations

•  The best queries are in a single partition. i.e. WHERE partition key = <something>!

•  Each new partition requires a new disk seek. •  Queries that span multiple partitions are s-l-o-w •  Queries that span multiple cluster columns are fast

©2014 DataStax Confidential. Do not distribute without consent. 25

Page 26: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Authentication and Authorisation

•  CQL supports creating users and granting them access to tables etc.. •  You need to enable authentication in the cassandra.yaml config file. •  You can create, alter, drop and list users •  You can then GRANT permissions to users accordingly – ALTER,

AUTHORIZE, DROP, MODIFY, SELECT.

©2014 DataStax Confidential. Do not distribute without consent. 26

Page 27: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics - Tracing

•  You can turn on tracing on or off for queries with the TRACING ON | OFF command.

•  This can help you understand what Cassandra is doing and identify any performance problems.

•  http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

©2014 DataStax Confidential. Do not distribute without consent. 27

Page 28: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – TTL

•  Expiring Columns, or Time to Live (TTL)

INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘abe’, ‘lincoln’) USING TTL 3600;!

// Expires data in one hour!

©2014 DataStax Confidential. Do not distribute without consent. 28

Page 29: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Data Types

©2014 DataStax Confidential. Do not distribute without consent. 29

Page 30: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Data Types: Collections

•  CQL supports having columns that contain collections of data. •  The collection types include:

•  Set, List and Map.

•  These data types are intended to support the type of 1-to-many relationships that can be modeled in a relational DB e.g. a user has many email addresses.

•  Some performance considerations around collections.

•  Requires serialization so don’t go crazy!

•  Often more efficient to denormalise further rather than use collections if intending to store lots of data.

•  Favour sets over list – lists not very performant

©2014 DataStax Confidential. Do not distribute without consent. 30

CREATE TABLE collections_example (!!id int PRIMARY KEY,!!set_example set<text>,!!list_example list<text>,!!map_example map<int, text>!

);

Page 31: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics – Data Types: Counters

•  Stores a number that incrementally counts the occurrences of a particular event or process.

UPDATE UserActions SET total = total + 2 WHERE user = 123 AND action = ’xyz';!

©2014 DataStax Confidential. Do not distribute without consent. 31

Page 32: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL Basics - Lightweight Transactions

•  Introduced in Cassandra 2.0

•  DSE 4 will include Cassandra 2.0 (due soon…)

•  DSE 3.2 (current version) is using Cassandra 1.2 •  Uses the Paxos consensus protocol to obtain an agreement across the cluster. •  Example:

!INSERT INTO customer_account (customerID, customer_email) !VALUES (‘LauraS’, ‘[email protected]’) !IF NOT EXISTS;!

!UPDATE customer_account SET customer_email=’[email protected]’ !IF customer_email=’[email protected]’;!

•  Great for 1% of your application – but not recommended to be used too much! •  Eventual consistency is your friend: http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency- hopeful-consistency-by-christos-kalantzis

©2014 DataStax Confidential. Do not distribute without consent. 32

Page 33: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Data Modeling Query based and denormalised

©2014 DataStax Confidential. Do not distribute without consent. 33

Page 34: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Cassandra is not a relational database

•  Cassandra doesn’t work the same way as an RDBMS •  Your data modeling approach won’t work the same way either •  No foreign keys •  No joins

©2014 DataStax Confidential. Do not distribute without consent. 34

Page 35: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Query-Driven Data Modeling

•  Start by addressing the queries that you will need to answer •  Your data should be able to match it directly

•  Think about: •  The actions your application needs to perform •  How you want to access the data •  What are the use cases? •  What does the data look like?

©2014 DataStax Confidential. Do not distribute without consent. 35

Page 36: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Query-Driven Data Modeling contd.

•  What are you trying to retrieve •  Does it need to be ordered? •  Is there any nesting of data? •  Do you need to group data? •  Do you need to filter data?

•  Does data expire? •  Does data need to be retrieved in chronological order?

©2014 DataStax Confidential. Do not distribute without consent. 36

Page 37: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Denormalisation

•  Combine table columns into a single view i.e. materialized view •  we have to create table that stores all the data that would be in the view

•  Remember - no joins in Cassandra!

Advantage: •  Having the data stored in a this manner greatly improves performance

•  Less seeking

•  Less network traffic Disadvantage: •  Data duplication

•  different tables for different queries •  you will use more disk space – but disks are cheap!

©2014 DataStax Confidential. Do not distribute without consent. 37

Page 38: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Avoid client-side joins

•  What is a client-side join? •  Querying a table from Cassandra •  Using the results from the first query to query a second table

•  Why avoid? •  Degrades performance i.e. more I/O, seeks and traffic

©2014 DataStax Confidential. Do not distribute without consent. 38

Page 39: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Don’t be scared of writes

•  Cassandra is the fastest DB there is for writes. •  Writing to multiple tables is not going to be slow! •  3-5000 writes/second/core e.g. 8 core server = 24k-30k writes per second! •  < 1ms typical for most rights (varies based on hardware)

©2014 DataStax Confidential. Do not distribute without consent. 39

Page 40: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Performance

©2014 DataStax Confidential. Do not distribute without consent. 40

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

Netflix Cloud Benchmark…

“In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.” Solving Big Data Challenges for Enterprise Application Performance Management, Tilman Rable, et al., August 2013, p. 10. Benchmark paper presented at the Very Large Database Conference, 2013. http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2013.pdf

End Point independent NoSQL Benchmark Highest in throughput vs MongoDB and HBase

Lowest in latency vs MongoDB and HBase

http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf

Page 41: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

One-to-many

•  Relationship without being relational •  Example – Users have many videos •  Wait? Where is the foreign key?

©2014 DataStax Confidential. Do not distribute without consent. 41

Page 42: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

One-to-Many CREATE TABLE videos (! videoid uuid,! videoname varchar,! username varchar,! description varchar,! tags varchar,! upload_date timestamp,! PRIMARY KEY(videoid)!);!

•  Static table to store videos •  UUID for unique video id •  Add username to denormalize

CREATE TABLE username_video_index (! username varchar,! videoid uuid,! upload_date timestamp,! video_name varchar,! PRIMARY KEY (username, videoid)!);!

!

SELECT video_name FROM username_video_index WHERE username = ‘tcodd’ AND videoid = ‘99051fe9’!

•  Lookup video by username

©2014 DataStax Confidential. Do not distribute without consent. 42

Write in two tables at once for fast lookups

Page 43: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Many-to-many

•  Example - users and videos have many comments.

©2014 DataStax Confidential. Do not distribute without consent. 43

Page 44: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Many-to-many

•  Model both sides of the view •  Insert both when comment is created •  Materialized views from either side

©2014 DataStax Confidential. Do not distribute without consent. 44

CREATE TABLE comments_by_video (! videoid uuid,! username varchar,! comment_ts timestamp,! comment varchar,! PRIMARY KEY (videoid,username)!);!

CREATE TABLE comments_by_user (! username varchar,! videoid uuid,! comment_ts timestamp,! comment varchar,! PRIMARY KEY (username,videoid)!);!

DON’T BE AFRAID OF WRITES

Page 45: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Partition Key is not the same as a Primary Key

•  Within a table, a row is referenced by a partition key •  This is either your primary key or the first part of a compound primary

key Similarities •  Partition key identifies a partition as being separate from other partitions •  Must be unique within a table Differences •  Inserting a new record with a partition key that already exists doesn’t do

what you’re used to in a RDBMS i.e. No primary key violations •  An INSERT using an existing partition key is allowed •  As a consequence, INSERT and UPDATE act in the same way i.e. UPSERT

©2014 DataStax Confidential. Do not distribute without consent. 45

Page 46: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

How to avoid UPSERTS

•  Guarantee that your primary keys are unique from one another •  Use an appropriate natural key based on your data •  Use a surrogate key for partition key

Risks with natural keys •  Depending on the type of natural key that is used, there may still be an

increased risk of UPSERTs •  Changing the datum used for a Natural Key requires a lot of overhead.

•  So why not use a sequence to generate a surrogate key? •  You cant – Cassandra doesn’t provide sequences!

©2014 DataStax Confidential. Do not distribute without consent. 46

Page 47: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

What, no sequences?

•  Sequences are a handy feature in RDMBS for auto-creation of IDs for you data. •  Guaranteed unique •  E.g. INSERT INTO user (id, firstName, LastName) VALUES (seq.nextVal(), ‘Ted’, ‘Codd’)!

•  Cassandra has no sequences! •  Extremely difficult in a masterless distributed system

•  Requires a lock (perf killer) •  What to do?

•  Use part of the data to create a unique key •  Use a UUID

©2014 DataStax Confidential. Do not distribute without consent. 47

Page 48: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

UUID

•  Universal Unique ID •  128 bit number represented in character form e.g. 99051fe9-6a9c-46c2-

b949-38ef78858dd0 •  Easily generated on the client •  Version 1 has a timestamp component •  Version 4 has no timestamp component •  Faster to generate

©2014 DataStax Confidential. Do not distribute without consent. 48

Page 49: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Indexing

•  This gives you fast access to data •  Secondary indexes != relational indexes

©2014 DataStax Confidential. Do not distribute without consent. 49

Page 50: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Adding an Index to a table

•  If we want to do a query on a column that is not part of your PK, you can create an index:

CREATE INDEX ON <table>(<column>); •  Than you can do a select: •  SELECT * FROM product WHERE type= ’PC'; •  Avoid doing this •  Not great for performance (although improvements are being made)

•  Much more efficient to model your data around the query i.e. roll your own indexes!!

©2014 DataStax Confidential. Do not distribute without consent. 50

Page 51: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Keyword index example

•  Using the previous video example, users want to tag videos.

•  Video table defined as: CREATE TABLE videos (!

videoid uuid,!

videoname varchar,!

username varchar,!

description varchar,!

tags varchar,!

upload_date timestamp,!

PRIMARY KEY(videoid)!

);!

©2014 DataStax Confidential. Do not distribute without consent. 51

•  Now we can define an index for tagging videos

!

CREATE TABLE video_tag_index (!

tag varchar,!

videoid uuid,!

timestamp timestamp!

PRIMARY KEY(tag, videoid)!

);!

Efficient Fast

Page 52: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Partial word index example

•  Table: CREATE TABLE email_index (!

!domain varchar,!

!user varchar,!

!username varchar,!

!PRIMARY KEY (domain, user)!

)!

•  User: jmiller, Email: [email protected] INSERT INTO email_index (domain, user, username) !

VALUES (‘@datastax.com’, ‘jmiller’, ‘jmiller’)!

©2014 DataStax Confidential. Do not distribute without consent. 52

Page 53: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Bitmap index

•  Multiple parts to a key •  Create a truth table of the various combinations •  However, inserts == the number of combinations

©2014 DataStax Confidential. Do not distribute without consent. 53

Page 54: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Bitmap index example

•  Find a car in a car park by variable combinations

©2014 DataStax Confidential. Do not distribute without consent. 54

Page 55: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Bitmap index example – Table definition

•  Make a table with three different key combinations

CREATE TABLE car_location_index (!

!make varchar,!

!model varchar,!

!colour varchar,!

!vehicle_id int,!

!lot_id int,!

!PRIMARY KEY ((make, mode, colour), vehicle_id)!

);!

©2014 DataStax Confidential. Do not distribute without consent. 55

Page 56: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Bitmap index example – Adding records

•  We are pre-optimizing for 7 possible queries of the index on insert. 1.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id)

VALUES (‘Ford’, ‘Mustang’, ‘Blue’, 1234, 8675309);!

2.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id) VALUES (‘Ford’, ‘Mustang’, ‘’, 1234, 8675309);!

3.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id) VALUES (‘Ford’, ‘’, ‘Blue’, 1234, 8675309);!

4.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id) VALUES (‘Ford’, ‘’, ‘’, 1234, 8675309);!

5.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id) VALUES (‘’, ‘Mustang’, ‘Blue’, 1234, 8675309);!

6.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id) VALUES (‘’, ‘Mustang’, ‘’, 1234, 8675309);!

7.  INSERT INTO car_location_index (make, model, colour, vehicle_id, lot_id) VALUES (‘’, ‘’, ‘Blue’, 1234, 8675309);!

©2014 DataStax Confidential. Do not distribute without consent. 56

Page 57: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Bitmap - selecting

•  Different queries are now possible:

©2014 DataStax Confidential. Do not distribute without consent. 57

Page 58: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series/Sensor Data

©2014 DataStax Confidential. Do not distribute without consent. 58

Page 59: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

What is time series data?

•  Sensors •  CPU, Network Card, Electronic Power Meter, Resource Utilization,

Weather •  Clickstream data •  Historical trends •  Stock Ticker

•  Anything that varies on a temporal basis •  Top Ten Most Popular Videos

©2014 DataStax Confidential. Do not distribute without consent. 59

Page 60: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Why Cassandra for time series data?

•  Cassandra based on BigTable storage model •  One key row and lots of (variable) columns •  Single layout on disk

©2014 DataStax Confidential. Do not distribute without consent. 60

Page 61: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Example

•  Storing weather data •  One weather station •  Temperature measurement every minute

©2014 DataStax Confidential. Do not distribute without consent. 61

Page 62: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Times Series Example – query data

•  Weather station id = Locality of single node

©2014 DataStax Confidential. Do not distribute without consent. 62

Page 63: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Example - Table

•  Data partitioned by weather station ID and time •  Timestamp goes in the clustered column •  Store the measurement as the non-clustered column(s)

•  Take advantage of partition clustering

CREATE TABLE temperature (!

!weatherstation_id text,!!event_time timestamp,!

!temperature text!!PRIMARY KEY (weatherstation_id, event_time) !

);!

©2014 DataStax Confidential. Do not distribute without consent. 63

Page 64: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Example

•  Simple to insert: INSERT INTO temperature (weatherstation_id, event_time, temperature)!

VALUES (‘1234abcd’, ‘2013-12-11 07:01:00’, ‘72F’);!

!

•  Simple to query SELECT temperature from temperature WHERE weatherstation_id=‘1234abcd’ AND event_time > ‘2013-04-03 07:01:00’ AND event_time < ‘2013-04-03 07:04:00’ !

!

©2014 DataStax Confidential. Do not distribute without consent. 64

Page 65: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Example – Partitioning

•  With the previous table, you can end up with a very large row on 1 partition i.e. PRIMARY KEY (weatherstation_id, event_time)

•  This would have to fit on 1 node. •  Cassandra can store 2 billion columns per storage row. •  The solution is to have a composite partition key to split things up: CREATE TABLE temperature (!

!weatherstation_id text,!

!date text,!

!event_time timestamp,!

!temperature text!

!PRIMARY KEY ((weatherstation_id, date), event_time) !

);!

©2014 DataStax Confidential. Do not distribute without consent. 65

Page 66: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Example – reading and writing

•  Simple to insert: INSERT INTO temperature (weatherstation_id, date, event_time, temperature)!

VALUES (‘1234abcd’, ‘2013-12-11’, ‘2013-12-11 07:01:00’, ‘72F’);!

!

•  Simple to query SELECT temperature from temperature !

WHERE weatherstation_id=‘1234abcd’ !

AND date = ‘2013-12-11’!

AND event_time > ‘2013-04-03 07:01:00’ AND event_time < ‘2013-04-03 07:04:00’ !

!

©2014 DataStax Confidential. Do not distribute without consent. 66

Page 67: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Example – reverse ordering

•  Common pattern for time series data is rolling storage. •  For example, we only want to show the last 10 temperature readings and older data is no

longer needed •  On most DBs you would need some background job to purge the old data. •  With Cassandra you can use TTL’s! CREATE TABLE temperature (!

!weatherstation_id text,!

!date text,!

!event_time timestamp,!

!temperature text!

!PRIMARY KEY ((weatherstation_id, date), event_time) !

) WITH CLUSTERING ORDER BY (event_time DESC);!

•  As part of the table definition, WITH CLUSTERING ORDER BY (event_time DESC), is used to order the data by the most recent first i.e. the data will be returned in this order.!

©2014 DataStax Confidential. Do not distribute without consent. 67

Page 68: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Example – TTL’ing

•  Simple to insert: INSERT INTO temperature (weatherstation_id, date, event_time, temperature)!

VALUES (‘1234abcd’, ‘2013-12-11’, ‘2013-12-11 07:01:00’, ‘72F’) USING TTL 20;!

•  This data point will automatically be deleted after 20 seconds. •  Eventually you will see all the data disappear. !•  Simple to query SELECT temperature from temperature !

WHERE weatherstation_id=‘1234abcd’ !

AND date = ‘2013-12-11’!

AND event_time > ‘2013-04-03 07:01:00’ AND event_time < ‘2013-04-03 07:04:00’ !

©2014 DataStax Confidential. Do not distribute without consent. 68

Page 69: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Bucket Example – mitigating spikes in data

•  In some situations, there might be a risk that you get an unforeseen volume of sensor data for the partition key for your row.

•  The risk here is that your row will continue to grow and fill-up the node. •  The workaround here is to attempt to split your data across multiple nodes:

CREATE TABLE temperature (!

!weatherstation_id text,!

!date text,!

!bucket_id int,!

!event_time timestamp,!

!temperature text!

!PRIMARY KEY ((weatherstation_id, date, bucket_id), event_time) !

);!

©2014 DataStax Confidential. Do not distribute without consent. 69

Page 70: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Bucket Example – reading and writing

•  Not so simple to insert. Client needs to generate a bucket id (often a random number within a certain range):

INSERT INTO temperature (weatherstation_id, date, bucket, event_time, temperature)!

VALUES (‘1234abcd’, ‘2013-12-11’, 10, ‘2013-12-11 07:01:00’, ‘72F’);!

!

•  Much more expensive to read. The client will have to iterate through the range of random numbers, execute a read for each and then merge and order the data in the client

SELECT temperature from temperature !

WHERE weatherstation_id=‘1234abcd’ AND date = ‘2013-12-11’!

AND bucket = 10, !

AND event_time > ‘2013-04-03 07:01:00’ AND event_time < ‘2013-04-03 07:04:00’ !

!©2014 DataStax Confidential. Do not distribute without consent. 70

Page 71: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Time Series Bucket Example

•  Only do this as a last resort. •  Reads become very expensive i.e. n x read(s) where n > range of buckets •  If your dealing with large volumes of data it can be hard work for the client

to merge and re-order things.

©2014 DataStax Confidential. Do not distribute without consent. 71

Page 72: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

DataStax Native Java Driver

©2013 DataStax Confidential. Do not distribute without consent. 72

Page 73: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Features

•  Provides CQL3 access to Cassandra using Java •  Utilizes Cassandra’s native protocol •  Automatic routing of client requests •  Configurable consistency policy •  Automatic failover •  Tracing support •  Tunable policies •  Load balancing •  Reconnection •  Consistency •  Queries can be executed synchronously or asynchronously •  Supports prepared statements •  Non-blocking I/O

©2014 DataStax Confidential. Do not distribute without consent. 73

Page 74: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Cassandra clients - Drivers

•  DataStax drivers for Cassandra •  Python •  C++ •  Java •  C# •  And more on the way…

•  http://www.datastax.com/download/clientdrivers

©2014 DataStax Confidential. Do not distribute without consent. 74

Page 75: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Where to get it?

•  The latest release of the driver is available on Maven Central. •  You can install it in your application using the following Maven dependency:

•  Documentation:

http://www.datastax.com/documentation/developer/java-driver

Javadoc: http://www.datastax.com/drivers/java/apidocs/index.html

©2014 DataStax Confidential. Do not distribute without consent. 75

Page 76: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Native Protocol

•  To use CQL via the client drivers, you must set the property start_native_transport to true in the cassandra.yaml on every node.

•  This protocol is an extremely efficient way of integrating with Cassandra. •  Supports synchronous and asynchronous requests •  Use the corresponding native driver in your app.

©2014 DataStax Confidential. Do not distribute without consent. 76

Page 77: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

CQL to Java Mappings

©2014 DataStax Confidential. Do not distribute without consent. 77

CQL3 Data Type Java Type

ascii java. lang. String

bigint long

blob java.nio.ByteBuffer

boolean boolean

counter long

decimal float

double double

float float

inet java.net.InetAddress

CQL3 Data Type Java Type

int int

list java.util.List<T>

map java.util.Map<K, V>

set java.util.Set<T>

text java.lang.String

timeuuid java.util.UUID

uuid java.util.UUID

varchar java.lang.String

varint java.math.BigInteger

Page 78: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Connecting to a Cluster

•  The Cluster class is your client apps entry point for connecting to Cassandra and getting back its metadata.

Cluster cluster = Cluster.builder().addContactPoints(”10.158.02.40”,“10.158.02.44”).build();

•  You can pass in one or many node addresses to connect to. •  Make sure to tidy up your cluster after your finished: cluster.shutdown();

©2014 DataStax Confidential. Do not distribute without consent. 78

Page 79: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Connecting to a Keyspace

•  After connecting to the cluster, you creation a Session on the keyspace you want to iteract with.

Session session = cluster.connect(“akeyspace”);

•  Make sure to tidy up after your self:

session.shutdown();

©2014 DataStax Confidential. Do not distribute without consent. 79

Page 80: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Inserting Data

try { session.execute( “INSERT INTO user (username, password)” + “VALUES(‘user1’,

‘user1password’);”); session.execute( “INSERT INTO user (username, password)” + “VALUES(‘user2’,

‘user2password’);”); } catch (NoHostAvailableException ex) {

System.out.println(“No Host available”); }

©2014 DataStax Confidential. Do not distribute without consent. 80

Page 81: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Reading Data

try { ResultSet result = session.execute ( "SELECT password from user " + "WHERE username = 'user2';"); if (result.isExhausted()) return; Row user = result.one(); System.out.println("Password is: " + user.getString("password"));

} catch (NoHostAvailableException ex) { System.out.println("No Host Available");

} catch (QueryValidationException ex) { System.out.println(“Requested consistency” + “level not met”);

}

©2014 DataStax Confidential. Do not distribute without consent. 81

Page 82: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Prepared Statements

PreparedStatement statement = session.prepare( "INSERT INTO user (username, password) " + "VALUES (?, ?);");

BoundStatement boundStatement = new BoundStatement(statement); try {

session.execute(boundStatement.bind("user4”,"user4password")); } catch (NoHostAvailableException ex) {

System.out.println("Host Not Available"); } catch (QueryExecutionException ex) {

System.out.println (”Syntax error, runtime, not authorized"); } catch (QueryValidationException ex) {

System.out.println ("Requested consistency level not met"); }

©2014 DataStax Confidential. Do not distribute without consent. 82

Page 83: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Query Builder Insert insert = QueryBuilder.insertInto("user”)

.value("username", ”rcohen”)

.value("password", ”mypassword");

session.execute(insert);

Query query = QueryBuilder

.select()

.all()

.from(”akeyspace", "user");

ResultSet rs = session.execute(query);

for (Row row : rs) {

System.out.println(String.format("%-20s\t%-20s",

row.getString("username"),

row.getString("password")));

}

©2014 DataStax Confidential. Do not distribute without consent. 83

Page 84: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Consistency Level SimpleStatement simpleStatement = new SimpleStatement ( "SELECT * FROM USER WHERE username = 'user2’;”);

// This will show the default consistency level of ConsistencyLevel.ONE

System.out.println("Consistency Level for this request: ” +simpleStatement.getConsistencyLevel());

//Now change the consistency level

simpleStatement.setConsistencyLevel(ConsistencyLevel.ALL);

You can also set the consistency level using the QueryBuilder

Insert insert = QueryBuilder.insertInto("user”)

.value("username", ”johnny”)

.value("password", ”mypassword")

setConsistencyLevel(ConsistencyLevel.ALL);

©2014 DataStax Confidential. Do not distribute without consent. 84

Page 85: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Tracing

•  Tracing can help with debugging or analysing how Cassandra is handling your queries.

Query insert = QueryBuilder.insertInto("simplex", "songs") .value("id", UUID.randomUUID()) .value("title", "Golden Brown") .value("album", "La Folie") .value("artist", "The Stranglers") .setConsistencyLevel(ConsistencyLevel.ONE).enableTracing();

©2014 DataStax Confidential. Do not distribute without consent. 85

Page 86: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Tracing

ResultSet results = getSession().execute(insert); ExecutionInfo executionInfo = results.getExecutionInfo(); •  This ExecutionInfo object contains information on the hosts it attempted to communicate

with, the host it used and a QueryTrace object.

QueryTrace queryTrace = executionInfo.getQueryTrace(); •  With these two objects you can obtain quite detail on how your query performed

©2014 DataStax Confidential. Do not distribute without consent. 86

Page 87: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Tracing Connected to cluster: xerxes Simplex keyspace and schema created. Host (queried): /127.0.0.1 Host (tried): /127.0.0.1 Trace id: 96ac9400-a3a5-11e2-96a9-4db56cdc5fe7!

activity | timestamp | source | source_elapsed!

---------------------------------------+--------------+------------+--------------!

Parsing statement | 12:17:16.736 | /127.0.0.1 | 28!

Peparing statement | 12:17:16.736 | /127.0.0.1 | 199!

Determining replicas for mutation | 12:17:16.736 | /127.0.0.1 | 348!

Sending message to /127.0.0.3 | 12:17:16.736 | /127.0.0.1 | 788!

Sending message to /127.0.0.2 | 12:17:16.736 | /127.0.0.1 | 805!

Acquiring switchLock read lock | 12:17:16.736 | /127.0.0.1 | 828!

Appending to commitlog | 12:17:16.736 | /127.0.0.1 | 848!

Adding to songs memtable | 12:17:16.736 | /127.0.0.1 | 900!

Message received from /127.0.0.1 | 12:17:16.737 | /127.0.0.2 | 34!

Message received from /127.0.0.1 | 12:17:16.737 | /127.0.0.3 | 25!

Acquiring switchLock read lock | 12:17:16.737 | /127.0.0.2 | 672!

Acquiring switchLock read lock | 12:17:16.737 | /127.0.0.3 | 525!

Appending to commitlog | 12:17:16.737 | /127.0.0.2 | 692!

Appending to commitlog | 12:17:16.737 | /127.0.0.3 | 541!

Adding to songs memtable | 12:17:16.737 | /127.0.0.2 | 741!

Adding to songs memtable | 12:17:16.737 | /127.0.0.3 | 583!

Enqueuing response to /127.0.0.1 | 12:17:16.737 | /127.0.0.3 | 751!

Enqueuing response to /127.0.0.1 | 12:17:16.738 | /127.0.0.2 | 950!

Message received from /127.0.0.3 | 12:17:16.738 | /127.0.0.1 | 178!

Sending message to /127.0.0.1 | 12:17:16.738 | /127.0.0.2 | 1189!

Message received from /127.0.0.2 | 12:17:16.738 | /127.0.0.1 | 249!

Processing response from /127.0.0.3 | 12:17:16.738 | /127.0.0.1 | 345!

Processing response from /127.0.0.2 | 12:17:16.738 | /127.0.0.1 | 377!

©2014 DataStax Confidential. Do not distribute without consent. 87

Page 88: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

OpsCenter

©2013 DataStax Confidential. Do not distribute without consent. 88

Page 89: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

DataStax OpsCenter

•  DataStax OpsCenter is a browser-based, visual management and monitoring solution for Apache Cassandra and DataStax Enterprise

•  Functionality is also exposed via HTTP APIs

©2013 DataStax Confidential. Do not distribute without consent. 89

Page 90: Helsinki Cassandra Meetup #2: Introduction to CQL3 and DataModeling

Thank You

We power the big data apps that transform business.

©2014 DataStax Confidential. Do not distribute without consent. 90