CASSANDRA -A Decentralized Structured Storage S ystem
description
Transcript of CASSANDRA -A Decentralized Structured Storage S ystem
![Page 1: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/1.jpg)
CASSANDRA-A Decentralized Structured Storage System
Presented BySadhana Kuthuru
![Page 2: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/2.jpg)
OVERVIEW:• Introduction• Data Model• API• System architecture• Facebook Inbox Search• Conclusion
![Page 3: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/3.jpg)
GOOD QUOTE!
Google,Amazon,Facebook and DARPA all recognized that when you scale system large enough, you can never put enough iron in one place to get the job done(and you wouldn’t want to, to prevent a single point of failure)once you accept that you have a distributed system, you need to give up consistency or availability ,which the fundamental transactionality of traditional RDMS cannot abide.
-Cedric Beust
![Page 4: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/4.jpg)
Why NoSQL(features):
It provides:• Horizontal scalability• Open-source• Schema-freeness• Easy replication support• Simple API
![Page 5: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/5.jpg)
CAP(for NoSQL)
![Page 6: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/6.jpg)
NEED FOR CASSANDRA BY FACE BOOK:
• Scalability• Availability• Replication• Fault Tolerance• Eventual consistency• Read/write performance• Flexible schema
![Page 7: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/7.jpg)
DATAMODEL:
• Table is a multi dimensional indexed by a row key.• Operation under single indexed row key is atomic per replica.• Columns are grouped into two kinds of column families: - Simple column family - Super column family(column family within a column
family)• Each column has - Name - Value -Time stamp
![Page 8: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/8.jpg)
DATA MODEL :
*Figure taken from Eben Hewitt’s (author of Oreilly’s Cassandra book) slides.
![Page 9: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/9.jpg)
CASSANDRA API:
The Cassandra API consists of following three methods:• insert(table; key; rowMutation)• get(table; key; columnName)• delete(table; key; columnName)
![Page 10: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/10.jpg)
SYSTEM ARCHITECTURE:
PARTITIONING• The ability to dynamically partition the data over the set of
nodes in the cluster.• Uses an order preserving hash function.• Load balancing-lightly loaded nodes move position to
alleviate highly loaded nodes.
![Page 11: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/11.jpg)
PARTITIONING:
![Page 12: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/12.jpg)
REPLICATION:
• How data is duplicated across nodes.• Uses replication to achieve high availability and durability.• Different Replication Policies -Rack Unaware -Rack Aware -Datacenter Aware.
![Page 13: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/13.jpg)
FAILURE DETECTION:
• A mechanism by which a node can locally determine if any other node in a system is up or down.
• Failure detection is given by accrual failure detector Ф.• If a node is faulty the suspicion level automatically increases with time Ф(t)→k as t →k where k is threshold variable(depends on system
load)which means node is dead.
![Page 14: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/14.jpg)
FAILURE DETECTION:
• If a node is correct Ф will be constant set by application. Generally
Ф(t)=0
![Page 15: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/15.jpg)
BOOTSTRAPPING:
• Two ways to add new node - new node gets assigned a random token which gives its
position in the ring. It gossips its location to the rest of the ring.
- new node reads its configuration files to contact the initial contact points
• An administrator uses command line or browser to initiate the addition and removal of nodes from Cassandra instance
![Page 16: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/16.jpg)
SCALING THE CLUSTER:
• Lightly loaded nodes can move to alleviate heavily loaded nodes.
• The Cassandra bootstrap algorithm is initiated.
![Page 17: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/17.jpg)
FACEBOOK INBOX SEARCH:
• Cassandra was designed to fulfill the storage needs of Inbox search problem.
• Unable users to search through their face book inbox.• Two kinds of search features: -Term search: search by a keyword -Interactions search: search by a user id.
![Page 18: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/18.jpg)
FACEBOOK INBOX SEARCH:
• To make searches fast ,it provides buffer caching of data .• Currently stores 50+ TB of data on a 150 node cluster.
Latency Stat Search Interactions Term Search
Min 7.69 ms 7.78 ms
Median 15.69ms 18.27 ms
Max 26.13 ms 44.41 ms
![Page 19: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/19.jpg)
APACHE CASSANDRA:
• After face book open sourced the code Facebook Cassandra of 2008 became Apache Cassandra in 2010.
• Some of the Cassandra deployments include: - Netflix,Twitter,Abode - HP,IBM,Cisco - Digg,Rackspace,Reditt.
![Page 20: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/20.jpg)
CONCLUSION:
Cassandra meets Facebook storage requirements:• Incremental growth .• Regular check of component failure.• Data optimization from special operations.• Simple architecture.• Fault Tolerance.
![Page 21: CASSANDRA -A Decentralized Structured Storage S ystem](https://reader036.fdocuments.net/reader036/viewer/2022062318/568165fa550346895dd92cb9/html5/thumbnails/21.jpg)
THANK YOU AND ANY QUESTIONS?