Cassandra and Clojure
-
Upload
nickmbailey -
Category
Software
-
view
1.017 -
download
1
description
Transcript of Cassandra and Clojure
©2013 DataStax. Do not distribute without consent.©2013 DataStax. Do not distribute without consent.
Nick Bailey
OpsCenter Architect
Cassandra and Clojure
Who am I?• OpsCenter Architect
• Monitoring/management tool for Cassandra
• Organizer of Austin Cassandra Users• http://www.meetup.com/Austin-Cassandra-Users/
• Third Thursday each month. Come join!
• Working with Cassandra for 4 years
Cassandra - An introduction
Cassandra - Intro
• Based on Amazon Dynamo and Google BigTable papers
• Shared nothing
• Distributed
• Predictable scaling
Dynamo
BigTable
Users
33
Cassandra - Architecture
Cassandra - Cluster Architecture
• All nodes participate in a cluster
• Shared nothing
• Add or remove as needed
• More capacity? Add a server
Cassandra - Data Distribution
75
0
25
50
• Each node owns 1 or more “tokens”
• Each piece of data has a “partition key”
• Partition key is hashed to determine token
• Hashes:
• Murmur3 (default)
• Md5
Cassandra - Replication
• Client writes to any node
• Node coordinates with replicas
• Data replicated in parallel
• Replication factor (RF): How many copies of your data?
Cassandra - Failure Modes
• Consistency level
• How many nodes?
• ONE/QUORUM/ALL
Cassandra - Geographically Distributed
• Client writes local
• Data syncs across WAN
• Replication Factor per DC
• Consistency Level
• LOCAL_QUORUM
Datacenter East Datacenter West
Data Modeling - Concepts
CQL• Cassandra Query Language
• SQL-like
• Not Relational
Terminology• Keyspace
• Table (Column Family)
• Row
• Column
• Partition Key
• Clustering Key
Data Typescqlsh:clojure_cassandra_demo> help types
CQL types recognized by this version of cqlsh:
ascii bigint blob boolean counter decimal double float inet int list map set text timestamp timeuuid uuid varchar varint
Advanced Concepts• Lightweight Transactions
• Atomic Batches
• User Defined Types (coming soon)
Data Modeling - An Example
Approaching Data Modeling• Model your queries, not your data
• Generally, optimize for reads
• Denormalize!
• Iterate!
Basic Last.fm Clone• See songs that user X has listened to recently
• See user X’s favorite songs in a specific month
• See who has recently listened to artist Y
• See artist Y’s most popular songs in a specific week
Basic Last.fm Clone• See songs that user X has listened to recently
• One of the most common patterns/data models
• Time series
• Immutable (good fit for Clojure!)
Basic Last.fm Clone• See songs that user X has listened to recently
SELECT song, artist, played_at FROM user_history WHERE username = ‘nickmbailey’ORDER BY played_at DESC;
• Partition key = ‘username’
• Clustering key = ‘played_at’
Basic Last.fm Clone• See songs that user X has listened to recently
CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)
Basic Last.fm Clone• See songs that user X has listened to recently
• This table has a “bad” partition key
CREATE TABLE user_history ( username text, played_at timestamp, album text, artist text, song text, PRIMARY KEY (username, played_at)) WITH CLUSTERING ORDER BY (played_at DESC)
Basic Last.fm Clone• See songs that user X has listened to recently
• Much better partition key
CREATE TABLE user_history ( username text, year_and_month text, played_at timestamp, album text, artist text, song text, PRIMARY KEY ((username, year_and_month), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)
Basic Last.fm Clone• See songs that user X has listened to recently
cqlsh:clojure_cassandra_demo> select * from user_history limit 5;
username | year_and_month | played_at | album | artist | song-------------+----------------+--------------------------+--------------------------+--------------------------+------------------------- nickmbailey | 2014-06 | 2014-06-30 17:13:54-0500 | Once More 'Round The Sun | Mastodon | Halloween nickmbailey | 2014-06 | 2014-06-30 17:08:53-0500 | Once More 'Round The Sun | Mastodon | Ember City b_hastings | 2014-06 | 2014-06-30 12:57:12-0500 | Buena Vista Social Club | Buena Vista Social Club | Chan Chan zack_smith | 2014-07 | 2014-07-30 12:49:35-0500 | Awake Remix | Tycho | Awake (Com Truise Remix) zack_smith | 2014-03 | 2014-03-30 12:44:50-0500 | Awake Remix | Tycho | Awake
Partition Key - unordered Clustering Key - Ordered
Basic Last.fm Clone• See user X’s favorite songs in a specific month
SELECT song, artist, play_count FROM user_history WHERE username = ‘nickmbailey’ AND month = ‘July’ORDER BY play_count DESC;
• Partition key = ‘username’, ‘month’
• Clustering key = ‘play_count’?
• Counters are a special case
Counters• Counter can not be part of the PRIMARY KEY
• No ordering based on counter value
• All non counter columns must be part of the PRIMARY KEY
• Limitations due to the storage format
Basic Last.fm Clone• See user X’s favorite songs in a specific month
CREATE TABLE user_song_counts ( username text, year_and_month text, artist text, song text, play_count counter, PRIMARY KEY ((username, year_and_month), artist, song))
Basic Last.fm Clone• See user X’s favorite songs in a specific month
• Results unordered• Client will have to do the sorting
cqlsh:clojure_cassandra_demo> select * from user_song_counts where username = 'nickmbailey' and year_and_month = '2014-07';
username | year_and_month | artist | song | count-------------+----------------+----------+-----------------------------------+------- nickmbailey | 2014-07 | Amos Lee | Tricksters, Hucksters, And Scamps | 10 nickmbailey | 2014-07 | Beck | Blackbird Chain | 1 nickmbailey | 2014-07 | Beck | Blue Moon | 4 nickmbailey | 2014-07 | Cherub | <3 | 12 nickmbailey | 2014-07 | Cherub | Chocolate Strawberries | 6
Basic Last.fm Clone• See who has recently listened to artist Y
CREATE TABLE artist_history ( artist text, year_and_week text, played_at timestamp, album text, song text, username text, PRIMARY KEY ((artist, year_and_week), played_at)) WITH CLUSTERING ORDER BY (played_at DESC)
Basic Last.fm Clone• See artist Y’s most popular songs in a specific week
CREATE TABLE artist_song_counts ( artist text, year_and_week text, album text, song text, play_count counter, PRIMARY KEY ((artist, year_and_week), album, song))
Cassandra from Clojure
Building Blocks
• Java Driver
• Hayt
Java Driver
• Fully featured
• Connection pooling
• Failover policies
• Retry policies
• Sync and Async interfaces
• Exposes client metrics
• https://github.com/datastax/java-driver
Hayt
• CQL DSL
• Similar to Korma
• Solely for building CQL strings
• https://github.com/mpenet/hayt
(select :foo (where { :bar 1
:baz 2)})
(->raw (select :foo (where {:bar 1 :baz 2)}))> "SELECT * FROM foo WHERE bar = 1 AND baz = 2;"
Clients
• Alia
• https://github.com/mpenet/alia
• Cassaforte
• https://github.com/clojurewerkz/cassaforte
• Both built on Java Driver and Hayt
• Not particularly different
Alia vs. Cassaforte
Cassaforte(let [conn (cc/connect ["127.0.0.1"])] (cql/create-keyspace conn "cassaforte_keyspace" (with {:replication {:class "SimpleStrategy" :replication_factor 1 }})))
Alia(def cluster (alia/cluster {:contact-points ["localhost"]}))(def session (alia/connect cluster))(alia/execute session
(create-keyspace :alia (if-exists false) (with {:replication {:class "SimpleStrategy" :replication_factor 1}})))
Learn by Example - Alia
Cluster Object
• Entry point
• Configures relevant client options
• :contact-points
• :load-balancing-policy
• :reconnection-policy
• :retry-policy
• and more!
(def cluster (alia/cluster {:contact-points ["localhost"]}))
Session Object
• A Session is associated with a keyspace
• Allows interacting with multiple keyspaces
(def cluster (alia/cluster {:contact-points [“localhost"]}))(def session (alia/connect cluster))(def session (alia/connect cluster) :my_keyspace)
Querying
• Multiple ways to query
• alia/execute
• Synchronous, block on result
• alia/execute-async
• Returns a Lamina result-channel (basically, a promise)
• Optional success/error callbacks
• alia/execute-chan
• Returns a core.async channel
• We won’t dive in to core.async now
Prepared Statements
• Statements can be prepared server side
• Better performance for common queries
(def prepared-statement (alia/prepare session "select * from users where user_name=?;"))
What else?
• See github and docs
• https://github.com/mpenet/alia
• http://mpenet.github.io/alia/qbits.alia.html
Demo
Demo
• https://github.com/nickmbailey/clojure-cassandra-demo
• Built with
• CCM - https://github.com/pcmanus/ccm
• Alia - https://github.com/mpenet/alia
• ring - https://github.com/ring-clojure/ring
• compojure - https://github.com/weavejester/compojure
• hiccup - https://github.com/weavejester/hiccup
• least - https://github.com/Raynes/least
MoreCassandra: http://cassandra.apache.org
DataStax Drivers: https://github.com/datastax
Documentation: http://www.datastax.com/docs
Getting Started: http://www.datastax.com/documentation/gettingstarted/index.html
Developer Blog: http://www.datastax.com/dev/blog
Cassandra Community Site: http://planetcassandra.org
Download: http://planetcassandra.org/Download/DataStaxCommunityEdition
Webinars: http://planetcassandra.org/Learn/CassandraCommunityWebinars
Cassandra Summit Talks: http://planetcassandra.org/Learn/CassandraSummit
©2013 DataStax Confidential. Do not distribute without consent.©2013 DataStax Confidential. Do not distribute without consent.