Apache cassandra architecture internals

Click here to load reader

Embed Size (px)

Transcript of Apache cassandra architecture internals

  • APACHE CASSANDRAArchitecture & Internals

    BHUVAN RAWA L

    SNAPDEAL .COM

  • BHUVAN RAWAL

    CASSANDRA - AN OVERVIEW

    NOSQL-DATABASE.ORG

    > MASSIVELY SCALABLE

    > PARTITIONED ROW STORE

    > MASTERLESS ARCHITECTURE

    > LINEAR SCALABILITY

    > NO SINGLE POINT OF FAILURE

    > MULTIPLE DC SUPPORT OUT OF BOX

  • BHUVAN RAWAL

    CASSANDRA - AN OVERVIEW

    2008Open sourced by Facebook on Google Code, in

    2009 became an Apache Incubator Project. In

    2010 gained top level status at Apache.

  • Can be adapted for different

    class of use cases

    GENERAL PURPOSECan be available at the loss of

    Node/Rack/DC

    AVAILABLE

    BHUVAN RAWAL

    KEY FEATURES

    CASSANDRA - AN OVERVIEW

    Seamless distribution across

    datacentres across continents

    DISTRIBUTED

  • JVM Heap & GC Algorithms

    Compaction Strategy

    Key Cache Size

    Row Cache

    Compression Chunk Size

    Speculative Retries

    Throughput vs Latency tuning

    KEY TUNABLES

    BHUVAN RAWAL

    CASSANDRA - AN OVERVIEW

  • Cassandra is the most popular wide column

    store - Wikipedia

    Deployed by 400+ Fortune-500 Firms

    667 Companies Verified onsiftery

    Apple 100,000+ Node Deployment

    Netflix - 95% Data on Cassandra

    Uber - 20 Cassandra Clusters, soon will be 100

    Spotify - 100+ Production Clusters

    SOME USERS

    BHUVAN RAWAL

    CASSANDRA - AN OVERVIEW

    https://labs.spotify.com/tag/apache-cassandra/

  • Determines how data is to be stored in

    nodes

    Should be same across the cluster

    Ordered Partitioner

    Random Partitioner

    Murmur3 Partitioner

    PARTITIONER

    BHUVAN RAWAL

    CASSANDRA - AN OVERVIEW

    https://labs.spotify.com/tag/apache-cassandra/

  • Determines node placement

    Allows to spread enough replicas to

    handle failures

    Failure Modes : Node -> Rack -> DC ->

    Region

    Tries its best to not have same replica in

    same rack

    SNITCH

    BHUVAN RAWAL

    CASSANDRA - AN OVERVIEW

    https://labs.spotify.com/tag/apache-cassandra/

  • status

    health

    tokens

    schema version

    data size

    phi_threshold

    GOSSIP PROTOCOL

    BHUVAN RAWAL

    CASSANDRA - AN OVERVIEW

    https://labs.spotify.com/tag/apache-cassandra/

  • As with most databases, data model is the key

    to successful deployments & scalability

    Test thoroughly on stage env

    Avoid Client Side joins as far as possible

    Materialized view - Boon for automated

    denormalization

    Tune Partition size to not affect cluster

    abnormally

    DATA MODEL

    WWW.AUGUSTA&CO.COM

    CASSANDRA - AN OVERVIEW

  • BHUVAN RAWAL

    TEAM

    Operations Manager

    CASSANDRA - AN OVERVIEW

  • BHUVAN RAWAL

    TEAM

    CEO / Director

    NANCY D. BROOKSHead Architect

    RICHARD B. BEVERIDGEOperations Manager

    JOHN V. POWELL

    CASSANDRA - AN OVERVIEW

  • WWW.AUGUSTA&CO.COM

    CASSANDRA - AN OVERVIEW

    Datastax Driver for Spark:

    -> Reads localized data off

    Cassandra Nodes

    -> Support for Hadoop

    -> Pig, Hive, Squoop, Mahout

    -> Solr integration

    ANALYTICS SUPPORT

  • BHUVAN RAWA L

    CASSANDRA - AN OVERVIEW

    ->Memtable

    ->SSTable- Sorted String

    -> Index

    -> Partition Summary

    -> Bloom Filter

    -> Compression

    STORAGE

  • BHUVAN RAWAL

    FELLOW DATASTORES

    HBASE

    RIAK MONGODB

    AEROSPIKE BIGTABLE

    SCYLLA

    CASSANDRA - AN OVERVIEW

  • THANK YOU! Bhuvan Rawal