Cassandra at no_sql

44
Apache Cassandra: NoSQL, Yes to Scale! srisatish ambati @srisatish

description

SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop. This talk lays out a few talking points for Apache Cassandra.

Transcript of Cassandra at no_sql

Page 1: Cassandra at no_sql

Apache Cassandra: NoSQL, Yes to Scale!

srisatish ambati@srisatish

Page 2: Cassandra at no_sql

NoSQL -Know your queries.

Page 3: Cassandra at no_sql

points

• Usecases• Why cassandra?• Usecase: Hadoop, Brisk• FUD: Consistency • Why facebook is not using Cassandra?• Community, Code, Tools• Q&A

Page 4: Cassandra at no_sql

Users. Netflix.Key by Customer, read-heavyKey by Customer:Movie, write-heavy

Page 5: Cassandra at no_sql

TimeSeries: (several customers)periodic readings: dev0, dev1…deviceID:metric:timestamp ->value

Metrics typically way larger dataset than users.

Page 6: Cassandra at no_sql

Why Cassandra?

Page 7: Cassandra at no_sql

Operational simplicity peer-to-peer

Page 8: Cassandra at no_sql

Operational simplicity peer-to-peer

Page 9: Cassandra at no_sql

Replication: Multi-datacenterMulti-region ec2Multi-availability zones

Page 10: Cassandra at no_sql

Replication: Multi-datacenterMulti-region ec2, awsMulti-availability zones

dc1 dc2

reads local

Page 11: Cassandra at no_sql

“Movie marathons on Netflix awaiting AWS to come back up.” #ec2disabled

4.21.2011, Amazon Web Services outage:

Page 12: Cassandra at no_sql

Netflix was running on AWS.

4.21.2011, Amazon Web Services outage:

Page 13: Cassandra at no_sql

fast durable writes. fast reads.

Page 14: Cassandra at no_sql

Writes Sequential, append-only.~1-5ms

Page 15: Cassandra at no_sql

Reads Local Key & row caches, (also, jna-based 0xffheap) indexes, materialized

Page 16: Cassandra at no_sql

Clients: cql, thrift pycassa, phpcassa hector, pelops (scala, ruby, clojure)

Page 17: Cassandra at no_sql

Usecase #3: hadoopHdfs cassandra hiveLogs stats analytics

Page 18: Cassandra at no_sql

BriskTruly peer-to-peer hadoop.

Page 19: Cassandra at no_sql

Namenode decomposition, explained.

Page 20: Cassandra at no_sql
Page 21: Cassandra at no_sql
Page 22: Cassandra at no_sql

Use column families (tables)inodesblock

Page 23: Cassandra at no_sql

near-real time hadoopLow latency: cassandra_dc nodesBatch Analytics: brisk_dc nodes

Page 24: Cassandra at no_sql

FUD, acronym: fear, uncertainty, doubt.

Page 25: Cassandra at no_sql

Consistency: R + W > N ORACLE, 2-node: R=1, W=2, N=2,(T=2)DNS

* N is replication factor. Not to be confused with T=total #of nodes

Page 26: Cassandra at no_sql

Tune-able, flexibility.For High Consistency:

read:quorum, write:quorumFor High Availability:

high W, low R.

Page 27: Cassandra at no_sql
Page 28: Cassandra at no_sql

Inbox Search: 600+cores.120+TB (2008)Went from 100-500m users.

Average NoSQL deployment size: ~6-12 nodes.

Page 29: Cassandra at no_sql

Usecase #5: searchApache Solr + Cassandra = Solandra

Other inbox/file Searches:xobni, c3

github.com/tjake/solandra

Page 30: Cassandra at no_sql

“Eventual consistency is harder to program.”mostly immutable data.complex systems at scale.

Page 31: Cassandra at no_sql

Miscellaneous, Myth: data-loss, partial rows.writes are durable.

Page 32: Cassandra at no_sql

Three more reasons for Cassandra...

Page 33: Cassandra at no_sql

ToolsAMIs, OpsCenter, DataStaxAppDynamics

Page 34: Cassandra at no_sql

B e a u t i f u l C 0 d e

= new code(); //less is more~90k.java.concurrent.@annotate. bloomfilters, merkletrees.non-blocking, staged-event-driven.bigtable, dynamo.

Page 35: Cassandra at no_sql

Current & Future Focus:Distributed Counters, CQL.Simple client.operational smoothening.

compaction.

Page 36: Cassandra at no_sql

CommunityRobust. Rapid. #Professional support from DataStax.

engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..

Come join the efforts!

Page 37: Cassandra at no_sql
Page 38: Cassandra at no_sql

Usecase #4: first NoSQL, then scale!simpledb Cassandra mongodb Cassandra

Page 39: Cassandra at no_sql
Page 40: Cassandra at no_sql
Page 41: Cassandra at no_sql

Copyright: xkcd

Page 42: Cassandra at no_sql

Copyright: plantoys

… more than one way to do it!

Page 43: Cassandra at no_sql

Summary -high scale peer-to-peer distributed database.

Page 44: Cassandra at no_sql

Q&A@srisatish