Cassandra Training Session...

download Cassandra Training Session Cassandra Contd ... replicationStrategy, replicationFactor, cfs); (definition);

of 33

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of Cassandra Training Session...

  • Cassandra Training Session 2

  • Outline

    Demand for Scalability

    Designing for Scalability

    Cassandra In Theory

    Data model, Architecture, Configuration, Reading and Writing data

    Cassandra In Action [deep]

    Setting up a cluster, Keyspaces, etc

    Accessing data and Monitoring a cluster

  • Demand for Scalability

    Now we are in the Information Age

    Data as well as Consumers are growing

    How much data do you produce / consume in each hour?

    What we need is a data storage with:

    law latency, high throughput, highly scalable and available, low cost, etc

  • Demand for Scalability Contd

    Can the relational database be the Silver Bullet?

    Its strengths for one use case can become a bottleneck for another use case

    ACID -> 2PC commit -> blocking -> not scalable

    Schema -> Normalization -> Lots of Tables -> Lots of JOIN operations and complex queries -> not scalable

  • Designing for Scalability


    Data Store

    Data Store-Lwords begin


    Data Store-Swords begin


    Query for Leonardo da Vinci

    Query for Sherlock Holmes

    Query for Leonardo da Vinci

    Query for Sherlock Holmes

    26 Nodes

  • Designing for Scalability Contd

    Shared-Nothing Architecture

    Data Store-Awords begin

    withY, Z, A

    Data Store-Owords begin

    withM, N, O

    Data Store-Zwords begin

    withX, Y, Z,

    Data Store-Pwords begin

    withP, O, P

  • Designing for Scalability Contd

    Asynchronous and Non-Blocking Processing (Event-Driven)

  • Cassandra

    Designed for Scalability

    Dynamos Architecture and BigTables Data model

    CAP (Consistency , Availability , Partition Tolerance) Theorem

    A and P

  • Cassandra Data Model

    A distributed multidimensional map with 4 or 5 dimensions.

    What about the Relational Model?

    Is it a 4-D map?

    {Database, Table, Row, Column} => Value (Cell)

    {CarbonDB, permission table, resource path, users} => A list of user names

  • Data Model Contd

    4D Model : [Keyspace][ColumnFamily][Key][Column] Keyspace -> Column Family

    Column Family -> Column Family Row

    Column Family Row -> Columns

    Column -> Data value

    My Yahoo:AddressBook:

    friend_one:name: foophone No : 3234353





    {My Yahoo, AddressBook, friend_one, name} => foo

  • Data Model Contd

    5D : [Keyspace][ColumnFamily][Key][SuperColumn][SubColumn] Keyspace -> Super Column Family

    Super Column Family -> Super Column Family Row

    Super Column Family Row -> Super Columns

    Super Column -> Columns

    Column -> Data valueMy Yahoo:


    friend_one:name: foophone No : 3234353




    Super Column


  • Data Model Contd

    Cluster is a container for keyspaces.

    Rows as well as Columns in a row are sorted.


    personal : {name: foo}, { address: some},{phone: 081}, education: {primary: somewhere }, {secondary: somewhere}, hobbies : {sports: some details},{music: some details}, .

    Cassandra needs one CF or SCF

    Relational DB need more tables (Normalize)


  • Data Model Contd

    Secondary Index

    Indexed by columns

    Name Main God Conquered By Archeological Site

    Inca Sun Spanish Machu Picchu

    Maya Sun Spanish Palenque

    What is the Main God of the Ancient Maya Civilization ?Primary Index

    What are the civilizations conquered by Spanish?

    What is the civilization conquered by Spanish and having the Archeological Site Machu Picchu?

    Secondary Index

  • Cassandra Architecture

    System Keyspace

    A peer-to-peer distribution model

    Decentralized -> Availability and Scalability







    Keys mapped to the Token range [0.1,1] => B, C, D

  • Architecture Contd

    Gossip and Failure Detection

    The gossiper runs periodically and knows who are dead and live.

    Use Phi Accrual Failure Detection algorithm

    Support decentralization and partition tolerance

  • Architecture Contd

    Commit Logs

    A single log file for server

    Provides Durability



    ............... .. .. For what CF Flushed?Data

  • Architecture Contd

    Hinted Handoff

    Support Write Availability

    Node A

    Node B

    A Request For Node B

    B is down

    Create a Hint

  • Architecture Contd


    Two types

    Minor Flush Memtable and create SSTables

    Major Merge SSTables

    Bloom Filters - Is this data with You?


  • Architecture Contd

    Anti-Entropy and Read Repair

    Replica synchronization

    Staged Event-Driven Architecture (SEDA)

    Read, Mutation, Gossip, Response, Anti-Entropy, Load Balance, Migration, and Streaming

  • Configuring Cassandra

    Replica Placement Strategies

    Simple Strategy

    Data Center 1 Data Center 2

  • Configuring Cassandra Contd

    Old Network Topology Strategy

    Data Center 1 Data Center 2

  • Configuring Cassandra Contd

    Network Topology Strategy

    Data Center 1 Data Center 2

  • Configuring Cassandra Contd

    Replication Factor (RF)

    RF = 3

    Read -- RWrite -- WQuorum Q = RF/2 +1

    Strong consistencyR + W > RF2 + 2 > 3 (R = W = Q)

  • Configuring Cassandra Contd

    Partitioners how is sharding done?

    Random Partitioner

    Order-Preserving Partitioner

    Byte-Ordered Partitioner

    Good Sharding

    Even Work load

    Good Performance

  • Configuring Cassandra Contd

    Snitches Who are my neighbours?

    Simple Snitch

    Comparing different octets in the IP addresses


    Creating and Managing a Cluster

    The bootstrap token How I know what is my data?

    Seed Nodes I can get from them my token as well as my data

  • Reading Data

    Read at any node !

    Read Request

    NF = 3 , R =2 , W = 1

    2 Nodes (Sync)

    1 Node (ASync)

    Read Repair

  • Writing Data

    Sequential No seek

    Write Request

    NF = 3 , R =2 , W = 1

  • Monitoring

    Provide a rich JMX based monitoring

    Each SDEA stages


    Caches, Column Family Stores, the Commit Log, and the

    Compaction Manager

    Some Metrics

    Read Count, Read Latency, Write Count and Write Latency, Pending Tasks

  • Cassandra in Action

    Setting Up a Cluster

    Add / Remove nodes

    Create and Configure a Keyspace

    Create and Configure a CF

    Writing Data

    Reading Data

    Monitoring though JMX

  • Hector Client API

    Cluster newCluster = HFactory.createCluster(clusterName,

    cassandraHostConfigurator, credentials);

    KeyspaceDefinition definition =HFactory.createKeyspaceDefinition(keyspaceName,

    replicationStrategy, replicationFactor, cfs);cluster.addKeyspace(definition);

    ColumnFamilyDefinition cfDefinition = HFactory.createColumnFamilyDefinition( keyspaceName,


  • Hector Client API Contd

    Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster);Mutator mutator =

    HFactory.createMutator(keyspace, new StringSerializer());mutator.insert(rowkey, cfName, HFactory.createStringColumn(name, value));

    Keyspace keyspace = HFactory.createKeyspace(keyspaceName, cluster);ColumnQuery columnQuery =

    HFactory.createStringColumnQuery(keyspace);columnQuery.setColumnFamily(cfName).setKey(rowKey).setName(cName);QueryResult result = columnQuery.execute();

  • Conclusion

    We discussed the data model, architecture, how reading and writing data happen

    We did a simple tutorial on how to setup a cluster, write and read data