Cassandra at no_sql

download Cassandra at no_sql

of 44

  • date post

    15-Jan-2015
  • Category

    Technology

  • view

    3.105
  • download

    2

Embed Size (px)

description

SFJava, SFNoSQL, SFMySQL, Marakana & Microsoft come together for a presentation evening of three NoSQL technologies - Apache Cassandra, Mongodb, Hadoop. This talk lays out a few talking points for Apache Cassandra.

Transcript of Cassandra at no_sql

  • 1. Apache Cassandra: NoSQL, Yes to Scale!
    srisatishambati
    @srisatish

2. NoSQL-
Know your queries.
3. points
Usecases
Why cassandra?
Usecase: Hadoop, Brisk
FUD:Consistency
Why facebook is not using Cassandra?
Community, Code, Tools
Q&A
4. Users. Netflix.
Key by Customer, read-heavy
Key by Customer:Movie, write-heavy
5. TimeSeries: (several customers)
periodic readings:dev0, dev1deviceID:metric:timestamp ->value
Metrics typically way larger dataset than users.
6. Why Cassandra?
7. Operational simplicity
peer-to-peer
8. Operational simplicity
peer-to-peer
9. Replication:
Multi-datacenter
Multi-region ec2
Multi-availability zones
10. reads local
dc1
dc2
Replication:
Multi-datacenter
Multi-region ec2, aws
Multi-availability zones
11. 4.21.2011,Amazon Web Services outage:
Movie marathons on Netflix awaiting AWS to come back up.#ec2disabled
12. 4.21.2011,Amazon Web Services outage:
Netflix was running on AWS.
13. fast durable writes.
fast reads.
14. Writes
Sequential, append-only.
~1-5ms
15. Reads
Local
Key & row caches, (also, jna-based 0xffheap)
indexes, materialized
16. Clients: cql, thrift
pycassa, phpcassa
hector, pelops
(scala, ruby, clojure)
17. Usecase #3: hadoop
Hdfs cassandra hive
Logs statsanalytics
18. Brisk
Truly peer-to-peer hadoop.
19. Namenode decomposition, explained.
20. 21. 22. Use column families (tables)
inode
sblock
23. near-real time hadoop
Low latency: cassandra_dc nodes
Batch Analytics: brisk_dc nodes
24. FUD,
acronym: fear, uncertainty, doubt.
25. Consistency:R + W > N
ORACLE, 2-node: R=1, W=2, N=2,(T=2)
DNS
* N is replication factor. Not to be confused with T=total #of nodes
26. Tune-able, flexibility.
For High Consistency:
read:quorum, write:quorum
For High Availability:
high W, low R.
27. 28. Inbox Search:
600+cores.120+TB (2008)
Went from 100-500m users.
Average NoSQL deployment size: ~6-12 nodes.
29. Usecase #5: search
Apache Solr + Cassandra = Solandra
Other inbox/file Searches:
xobni, c3
github.com/tjake/solandra
30. Eventual consistency is harder to program.
mostly immutable data.
complex systems at scale.
31. Miscellaneous,
Myth: data-loss, partial rows.
writes are durable.
32. Three more reasons for Cassandra...
33. Tools
AMIs, OpsCenter, DataStax
AppDynamics
34. B e a u t i f u l C 0 d e
= new code(); //less is more
~90k.java.concurrent.@annotate.
bloomfilters, merkletrees.
non-blocking, staged-event-driven.
bigtable, dynamo.
35. Current & Future Focus:
Distributed Counters, CQL.
Simple client.
operational smoothening.
compaction.
36. Community
Robust. Rapid. #
Professional support from DataStax.
engineers: independent,startups, large companies, Rackspace, Twitter, Netflix..
Come join the efforts!
37. 38. Usecase #4:first NoSQL, then scale!
simpledbCassandra
mongodb Cassandra
39. 40. 41. Copyright: xkcd
42. Copyright: plantoys
more than one way to do it!
43. Summary -
high scale peer-to-peer
distributed database.
44. Q&A
@srisatish