Introduction to Apache Cassandra
Embed Size (px)
Transcript of Introduction to Apache Cassandra
- 1. Introduction to Apache CassandraLuke Tillman (@LukeTillman)Language Evangelist at DataStax
2. Who are you?!Evangelist with a focus on the .NET CommunityLong-time DeveloperRecently presented at Cassandra Summit 2014 with MicrosoftVery Recent Denver Transplant2 3. DataStax and CassandraDataStax EnterpriseApache Cassandra, now with more QA!Easy integrations with Solr, Apache Spark, HadoopDev and Ops ToolingDevCenter IDE, OpsCenterOpen source driversJava, C#, Python, C++, Ruby, NodeJS3 4. Unlimited, free use of DataStax EnterpriseNo limit on number of nodes or other hidden restrictionsIf youre a startup, its free.Requirements:< $2M annual revenue, < $20M capital raised4www.datastax.com/startups 5. 1What is Cassandra?2How does it work?3Cassandra Query Language (CQL)4Whos using it?5Questions5 6. What is Cassandra?6 7. What is Cassandra?A Linearly Scaling and Fault Tolerant Distributed DatabaseFully DistributedData spread over many nodesAll nodes participate in a clusterAll nodes are equalNo SPOF (shared nothing)7 8. What is Cassandra?Linearly ScalingHave More Data? Add more nodes.Need More Throughput? Add more nodes.8http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html 9. What is Cassandra?Fault TolerantNodes Down != Database DownDatacenter Down != Database Down9 10. What is Cassandra?Fully ReplicatedClients write localData syncs across WANReplication Factor per DC10USEuropeClient 11. Cassandra and the CAP TheoremThe CAP Theorem limits what distributed systems can doConsistencyAvailabilityPartition ToleranceLimits? Pick 2 out of 311 12. Cassandra and the CAP TheoremConsistencyWhen I ask the same question to any part of the system, I should get the same answer12Is he guilty yet?No.No.No.Consistent 13. Cassandra and the CAP TheoremConsistencyWhen I ask the same question to any part of the system, I should get the same answer13Is he guilty yet?No.Yes.Yes.Not Consistent 14. Cassandra and the CAP TheoremAvailabilityWhen I ask a question, I will get an answer14Is he guilty yet?Yes.Available 15. Cassandra and the CAP TheoremAvailabilityWhen I ask a question, I will get an answer15Is he guilty yet?I dont know, we have to wait for Dreamy to wake up.Not Available 16. Cassandra and the CAP TheoremPartition ToleranceI can ask questions even when the system is having intra-system communication problems.16Is he guilty yet?TolerantNo.Team TyrionTeam Cersei 17. Cassandra and the CAP TheoremPartition ToleranceI can ask questions even when the system is having intra-system communication problems.17Is he guilty yet?Not TolerantIm not sure without asking them and were not speaking (Im pretty sure that one helped kill my sister).Team TyrionTeam Cersei 18. Cassandra and the CAP TheoremCassandra is an AP system that is Eventually Consistent18Is he guilty yet?No.Wait, hes going to take the black. Yes.No.Eventually Consistent 19. Cassandra and the CAP TheoremCassandra is an AP system that is Eventually Consistent19Is he guilty yet?Yes.Yes.Eventually ConsistentYes. 20. How does it work?20 21. Two knobs control Cassandra fault toleranceReplication Factor (server side)How many copies of the data should exist?21ClientBADCABACDDBCWrite ARF=3 22. Two knobs control Cassandra fault toleranceConsistency Level (client side)How many replicas do we need to hear from before we acknowledge?22ClientBADC ABACDDBCWrite ACL=QUORUMClientBADCABA CDDBCWrite ACL=ONE 23. Consistency LevelsApplies to both Reads and Writes (i.e. is set on each query)ONE one replica from any DCLOCAL_ONE one replica from local DCQUORUM 51% of replicas from any DCLOCAL_QUORUM 51% of replicas from local DCALL all replicasTWO23 24. Consistency Level and SpeedHow many replicas we need to hear from can affect how quickly we can read and write data in Cassandra24ClientBADC ABACDDBC5 s ack300 s ack12 s ack12 s ackRead A(CL=QUORUM) 25. Consistency Level and AvailabilityConsistency Level choice affects availabilityFor example, QUORUM can tolerate one replica being down and still be available (in RF=3)25ClientBADCABA CDDBCA=2A=2A=2Read A(CL=QUORUM) 26. Consistency Level and Eventual ConsistencyCassandra is an AP system that is Eventually Consistent so replicas may disagreeColumn values are timestampedIn Cassandra, Last Write Wins (LWW)26ClientB ADC ABACDDBCA=2NewerA=1 OlderA=2Read A(CL=QUORUM)Christos from Netflix: Eventual Consistency != Hopeful Consistency https://www.youtube.com/watch?v=lwIA8tsDXXE 27. Writes in the clusterFully distributed, no SPOFNode that receives a request is the Coordinator for requestAny node can act as Coordinator27ClientBADCABA CDD BCWrite A(CL=ONE)Coordinator Node 28. Writes in the cluster Data DistributionPartition Key determines node placement28Partition Keyid='pmcfadin'lastname='McFadin'id='jhaddad'firstname='Jon'lastname='Haddad'id='ltillman'firstname='Luke'lastname='Tillman'CREATE TABLE users ( id text, firstname text, lastname text, PRIMARY KEY (id) ); 29. Writes in the cluster Data DistributionThe Partition Key is hashed using a consistent hashing function (Murmur 3) and the output is used to place the data on a nodeThe data is also replicated to RF-1 other nodes29Partition Keyid='ltillman'firstname='Luke'lastname='Tillman'Murmur3id: ltillmanMurmur3: ABADC ABACDDBCRF=3 30. Hashing Back to RealityBack in reality, Partition Keys actually hash to 128 bit numbersNodes in Cassandra own token ranges (i.e. hash ranges)30B ADCABACDD BCRangeStartEndA0xC000000..10x0000000..0B0x0000000..10x4000000..0C0x4000000..10x8000000..0D0x8000000..10xC000000..0Partition Keyid='ltillman'Murmur30xadb95e99da887a8a4cb474db86eb5769 31. Writes on a single nodeClient makes a write requestClientUPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'DiskMemory 32. Writes on a single nodeData is appended to the Commit LogCassandra writes are FAST due to log appended storageClientUPDATE usersSET firstname = 'Luke'WHERE id = 'ltillman'Commit Logid='ltillman', firstname='Luke'DiskMemory 33. Writes on a single nodeData is written to MemtableClientUPDATE usersSET firstname = 'Luke'WHERE id = 'ltillman'Commit Logid='ltillman', firstname='Luke'DiskMemoryMemtable for UsersSome Other Memtableid='ltillman'firstname='Luke'lastname='Tillman' 34. Writes on a single nodeServer acknowledges to clientClientUPDATE usersSET firstname = 'Luke'WHERE id = 'ltillman'Commit Logid='ltillman', firstname='Luke'DiskMemoryMemtable for UsersSome Other Memtableid='ltillman'firstname='Luke'lastname='Tillman' 35. Writes on a single nodeOnce Memtable is full, data is flushed to disk as SSTable (Sorted String Table)ClientUPDATE users SET firstname = 'Luke' WHERE id = 'ltillman'Data DirectoryDiskMemoryMemtable for UsersSome Other Memtableid='ltillman'firstname='Luke'lastname='Tillman'Some Other SSTableSSTable #1 for UsersSSTable #2 for Users 36. CompactionCompactions merge and unify data in our SSTablesSSTables are immutable, so this is when we consolidate rows36SSTable #1 for UsersSSTable #2 for UsersSSTable #3 for Usersid='ltillman'firstname='Lucas' (timestamp=Older)lastname='Tillman'id='ltillman'firstname='Luke'lastname='Tillman'id='ltillman'firstname='Luke' (timestamp=Newer) 37. Reads in the clusterSame as writes in the cluster, reads are coordinatedAny node can be the Coordinator Node37ClientB ADCABA CDDBCRead A(CL=QUORUM)Coordinator Node 38. Reads on a single nodeClient makes a read request38ClientSELECT firstname, lastname FROM users WHERE id = 'ltillman'DiskMemory 39. Reads on a single nodeData is read from (possibly multiple) SSTables and mergedReads in Cassandra are also FAST but are limited by Disk IO39ClientSELECT firstname, lastname FROM users WHERE id = 'ltillman'DiskMemorySSTable #1 for Usersid='ltillman'firstname='Lucas' (timestamp=Older)lastname='Tillman'SSTable #2 for Usersid='ltillman'firstname='Luke'(timestamp=Newer)firstname='Luke'lastname='Tillman' 40. Reads on a single nodeAny unflushed Memtable data is also merged40ClientSELECT firstname, lastnameFROM usersWHERE id = 'ltillman'DiskMemoryfirstname='Luke'lastname='Tillman'Memtable for Users 41. Reads on a single nodeClient gets acknowledgement with the data41ClientSELECT firstname, lastnameFROM usersWHERE id = 'ltillman'DiskMemoryfirstname='Luke'lastname='Tillman' 42. Compaction - RevisitedCompactions merge and unify data in our SSTables, making them important to reads (less SSTables = less to read/merge)42SSTable #1 for UsersSSTable #2 for UsersSSTable #3 for Usersid='ltillman'firstname='Lucas' (timestamp=Older)lastname='Tillman'id='ltillman'firstname='Luke'lastname='Tillman'id='ltillman'firstname='Luke' (timestamp=Newer) 43. Cassandra Query Language (CQL)43 44. Data StructuresKeyspace is like RDBMS Database or SchemaLike RDBMS, Cassandra uses Tables to store dataPartitions can have one row (narrow) or multiple rows (wide)44KeyspaceTablesPartitionsRows 45. Schema Definition (DDL)Easy to define tables for storing dataFirst part of Primary Key is the Partition KeyCREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, tags set, added_date timestamp, PRIMARY KEY (videoid) ); 46. Schema Definition (DDL)One row per partition (familiar)CREATE TABLE videos (videoid uuid,userid uuid,name text,description text,tags set,added_date timestamp,PRIMARY KEY (videoid));name...Keyboard Cat...Nyan Cat...Original Grumpy Cat...videoid689d56e5- 93357d73- d978b136- 47. Clustering ColumnsSecond part of Primary Key is Clustering ColumnsClustering columns affect ordering of data (on disk)Multiple rows per partition47CREATE TABLE comments_by_video (videoid uuid,commentid timeuuid,userid uuid,comment text,PRIMARY KEY (videoid, commentid)) WITH CLUSTERING ORDER BY (commentid DESC); 48. Clustering Columns Wide Rows (Partitions)Use of Clustering Columns is where the term Wide Rows comes from48videoid='0fe6a...'userid='ac346...'comment= 'Awesome!'commentid='82be1...'(10/1/2014 9:36AM)userid= 'f89d3.