Cassandra - Research Paper Overview

30
Cassandra A Decentralized Structured Storage System Avinash Lakshman Prashant Malik Facebook Facebook Presented by Sameera Nelson

description

Cassandra is a Distribu

Transcript of Cassandra - Research Paper Overview

Page 1: Cassandra - Research Paper Overview

CassandraA Decentralized Structured Storage System

Avinash Lakshman Prashant Malik Facebook Facebook

Presented by Sameera Nelson

Page 2: Cassandra - Research Paper Overview

Outline …

Introduction

Data Model

System Architecture

Bootstrapping & Scaling

Local Persistence

Conclusion

Page 3: Cassandra - Research Paper Overview

What is Cassandra ?

Distributed Storage System

Manages Structured Data

Highly available , No SPoF

Not a Relational Data Model

Handle high write throughput

◦ No impact on read efficiency

Page 4: Cassandra - Research Paper Overview

Motivation

Operational Requirements in Facebook

◦ Performance

◦ Reliability/ Dealing with Failures

◦ Efficiency

◦ Continues Growth

Application◦ Inbox Search Problem, Facebook

Page 5: Cassandra - Research Paper Overview

Related Work

Google File System◦ Distributed FS, Single master/Slave

Ficus/ Coda

◦ Distributed FS

Farsite

◦ Distributed FS, No centralized server

Bayou◦ Distributed Relational DB System

Dynamo

◦ Distributed Storage system

Page 6: Cassandra - Research Paper Overview

Data Model

Page 7: Cassandra - Research Paper Overview

Data Model

Figure from Eben Hewitt’s slides.

Page 8: Cassandra - Research Paper Overview

• Table• Multidimensional map indexed by key

• Columns • Grouped in to Column Families• Simple• Super (Nested Column Families)

• Column has• Name/ Value/ Timestamp

Data Model

Page 9: Cassandra - Research Paper Overview

Supported Operations

insert(table; key; rowMutation)

get(table; key; columnName)

delete(table; key; columnName)

Page 10: Cassandra - Research Paper Overview

Query Language

CREATE TABLE users

( user_id int PRIMARY KEY,

fname text,

lname text );

INSERT INTO users

(user_id, fname, lname) VALUES (1745, 'john', 'smith');

SELECT * FROM users;

Page 11: Cassandra - Research Paper Overview

System Architecture

Page 12: Cassandra - Research Paper Overview

Fully Distributed …No Single Point of Failure

Page 13: Cassandra - Research Paper Overview

Cassandra Architecture

PartitioningData distribution across nodes

ReplicationData duplication across nodes

Cluster MembershipNode management in cluster

adding/ deleting

Page 14: Cassandra - Research Paper Overview

Partitioning

The Token Ring

Page 15: Cassandra - Research Paper Overview

Partitioning Partitions using Consistent hashing

Page 16: Cassandra - Research Paper Overview

Partitioning Assignment in to the relevant partition

Page 17: Cassandra - Research Paper Overview

Replication

Based on configured replication factor

Page 18: Cassandra - Research Paper Overview

Replication

Different Replication Policies

◦Rack Unaware

Replicate at N-1 nodes

◦Rack Aware

Zookeeper, using a leader

◦Data center Aware

similar to Rack Aware, leader chosen at

Datacenter level.

Page 19: Cassandra - Research Paper Overview

Cluster Membership

Based on scuttlebutt

Efficient Gossip based mechanism

Inspired for real life rumor

spreading.

Anti Entropy protocol

◦ Repair replicated data by comparing &

reconciling differences

Page 20: Cassandra - Research Paper Overview

Cluster Membership

Gossip Based

Page 21: Cassandra - Research Paper Overview

Cluster Membership

Failure Detection◦ Accrual Failure Detector

If a node is faulty, the suspicion level increases.

Φ(t) k as t kk - threshold variable

◦ If node is correct

Φ(t) = 0

Page 22: Cassandra - Research Paper Overview

Bootstrapping & Scaling

Page 23: Cassandra - Research Paper Overview

Bootstrapping & ScalingBootstrapping

◦Node selects random token

◦Locally persisted, gossiped to cluster

Scaling

◦Cassandra bootstrap algorithm initiated by

operator

◦New node get a spitted range of heavily

loaded node

Page 24: Cassandra - Research Paper Overview

Local Persistence

Page 25: Cassandra - Research Paper Overview

Local Persistence

Write Operation

Page 26: Cassandra - Research Paper Overview

Local Persistence

Write Operation

◦Flush to disk after threshold

◦Sequential Entries, Index per each

◦Data file merging

◦Rolling Commit logs

Page 27: Cassandra - Research Paper Overview

Local Persistence

Read Operation

◦Indexes all data on primary key

◦Maintain column indicesRead

Data

Page 28: Cassandra - Research Paper Overview

Conclusion

Page 29: Cassandra - Research Paper Overview

Conclusion

Proven high scalability, performance, and

wide applicability

Very high update throughput, delivering

low latency

Future work

◦ Adding compression

◦ Support atomicity across keys

◦ Secondary index support

Page 30: Cassandra - Research Paper Overview

Thank You