Introduction to Cassandra - Denver

download Introduction to Cassandra - Denver

of 15

  • date post

  • Category


  • view

  • download


Embed Size (px)


Here's the slides from the intro talk Luke Tillman and I gave at Cassandra Day Denver.

Transcript of Introduction to Cassandra - Denver

  • 1. Introduction to CassandraJon Haddad, Technical Evangelist, DataStaxLuke Tillman, Language Evangelist, DataStax@rustyrazorblade, @LukeTillman2013 DataStax Confidential. Do not distribute without consent.1

2. High Level Architecture Ring based replication Only 1 type of server (cassandra) All nodes hold data and can answerqueries No SPOF Build for HA & Scalability Multi-DC Data is found by key (CQL) Runs on JVM 3. Hash Ring Key is hashed to a position on thering Data is replicated to RF=N servers Using a snitch (build in feature) youcan ensure replicas are located ondifferent racks / AZ 4. CAP Tradeoffs! Cassandra chooses Availability &Partition Tolerance over Consistency Replication factor (RF=3) Queries have tunable consistency level ALL, QUORUM, ONE 5. The Write Path Writes are written to any node in the cluster(coordinator) Writes are written to commit log, then tomeltable Memtable flushed to disk periodically(sstable) New memtable is created in memory Deletes are actually a special write case,called a tombstone 6. What is an SSTable? Immutable data file Deletes are written as tombstones Every write includes a timestamp of when itwas written Partition is spread across multiple SSTables Same column can be in multiple SSTables Merged through compaction, only latesttimestamp is keptsstablesstable sstable sstable 7. The Read Path Any server may be queried, it acts as thecoordinator Contacts nodes with the requested key On each node, data is pulled fromSSTables and merged Consistency< ALL performs read repairin background (read_repair_chance) 8. Data Structures Like an RDBMS, Cassandra uses a Table tostore data But theres where the similarities end Partitions within tables Rows within partitions (or a single row) CQL to create tables & query data Partition keys determine where a partitionis found Clustering keys determine ordering of rowswithin a partitionKeyspaceTablePartitionRow 9. Example: Single Row Partition Simple User system Identified by name (pk) 1 Row per partition This is familiar territoryname age jobjon 33 evangelistluke 33 evangelistold pete 108 retireds. seagal 62 actorJCVD 53 actorcqlsh:demo> select * from user WHERE name = 'JCVD'cqlsh:demo> create table user(name text primary key,age int,job text); 10. Example: Multiple Rows Comments on photos Comments are always selected bythe photo_id There are only 4 rows in 2 partitions In the real world, use UUIDs insteadof int for PKphoto_id comment_id user comment5 1 jon hi5 2 luke oh hey5 3 JCVD AHHHHH!!!6 4 jon great picselect * from comment where photo_id=5create table comment( photo_id int,comment_id int,user text,comment text,primary key (photo_id, comment_id)); 11. Partition with Clustering Multiple rows are transposed into a single partition Partitions vary in size Old terminology - "wide row"photo_id comment_id name comment comment_id user comment comment_id user comment5 1 jon hi 2 luke oh hey 3 JCVD AHHHHH!!!6 4 jon great pic 12. CQL Data TypesBasic Types Collectionstext uuid counter mapint timeuuid listdecimal setblobRead the CQL documentation for the full list of types 13. Pro Data Modeling How do I query my data if I can onlyquery by key? Denormalize! Create multiple views into your data(multiple tables) Cassandra is built for fast writes Use fast writes to do as few reads aspossible 14. Lets Download Cassandra! click downloads use drop down to pick OS 15. 2013 DataStax Confidential. Do not distribute without consent. 15