CASSANDRA - Next to RDBMS
-
Upload
vipul-thakur -
Category
Data & Analytics
-
view
285 -
download
0
Transcript of CASSANDRA - Next to RDBMS
CASSANDRA – An Open Source Data Storage system
Presented By :Vipul KumarCr No. - 11/269
UNIVERSITY COLLEGE OF ENGINEERING, KOTA RAJASTHAN TECHNICAL UNIVERSITY
Presented To :Mr R K Banyal SirCSE Department
COMPUTER SCIENCE AND ENGINEERING DEPARTMENT
Contents
• What is Cassandra ?• History• Data Model• System architecture• Key features and benefits• Who is using Cassandra ?• Conclusion and future scope
Contents
Apache Cassandra™ is a free
DistributedHigh performanceExtremely scalableFault tolerant(i.e. no single point of failure..)open source NoSQL database.
Definition of Cassandra
Big Table Dynamo
The history of Cassandra
• Table is a multi dimensional map indexed by key (row key).
• Columns are grouped into Column Families.• 2 Types of Column Families– Simple– Super (nested Column Families)
• Each Column has– Name– Value– Timestamp
Data Model
Data Model
• PartitioningHow data is partitioned across nodes
• ReplicationHow data is duplicated across nodes
System Architecture
• Nodes are logically structured in Ring Topology.
• Hashed value of key associated with data partition is used to assign it to a node in the ring.
• Hashing rounds off after certain value to support ring structure.
• Lightly loaded nodes moves position to alleviate highly loaded nodes.
Partitioning
• Each data item is replicated at N (replication factor) nodes.
• Different Replication Policies– Rack Unaware – replicate data at N-1 successive
nodes after its coordinator– Rack Aware – uses ‘Zookeeper’ to choose a leader
which tells nodes the range they are replicas for– Datacenter Aware – similar to Rack Aware but leader
is chosen at Datacenter level instead of Rack level.
Replication
Replication
Gossip Protocol
• Network Communication protocols inspired for real life rumor spreading.
• Periodic, Pairwise, inter-node communication.• Low frequency communication ensures low cost.• Random selection of peers.• Example – Node A wish to search for pattern in data
– Round 1 – Node A searches locally and then gossips with node B.
– Round 2 – Node A,B gossips with C and D.– Round 3 – Nodes A,B,C and D gossips with 4 other nodes ……
• Round by round doubling makes protocol very robust.
Key features & benefits
• Gigabyte to Petabyte scalability• Big data scalability• No single point of failure• Easy Replication / Data distribution• No need for caching software• Flexible Schema
Big Data Scalability
• Capable of comfortably scaling to petabytes• New nodes = linear performance increases• Add new nodes online
2
1
2
1
4
3
Double throughputcapacity
No single point of failure
• All nodes are same• Read/write from any node• Can replicate data among different physical data center racks
Easy Replication
• Transparency handled by Cassandra• Multi data center capable• Exploit all the benefit of cloud computing
No need for caching layer
• Peer to peer layer removes need for special caching layer and the programming.
• The database use the memory from all the participating nodes to cache the assigned data.
Flexible Schema
• Dynamic schema design allows for more flexible data storage than rigid RDBMS
• Handles structured, semi-structured and unstructured data.• No offline / downtime for schema changes
Who uses Cassandra
Conclusion & future scope
• Cassandra is an open source storage system providing scalability, high performance, and wide applicability.
• Cassandra can support a very high update throughput while delivering low latency.
• Future works involves adding compression, ability to support atomicity across keys and secondary index support.
Thank You