CASSANDRA - Next to RDBMS

20
CASSANDRA – An Open Source Data Storage system Presented By : Vipul Kumar Cr No. - 11/269 UNIVERSITY COLLEGE OF ENGINEERING, KOTA RAJASTHAN TECHNICAL UNIVERSITY Presented To : Mr R K Banyal Sir CSE Department COMPUTER SCIENCE AND ENGINEERING DEPARTMENT

Transcript of CASSANDRA - Next to RDBMS

Page 1: CASSANDRA - Next to RDBMS

CASSANDRA – An Open Source Data Storage system

Presented By :Vipul KumarCr No. - 11/269

UNIVERSITY COLLEGE OF ENGINEERING, KOTA RAJASTHAN TECHNICAL UNIVERSITY

Presented To :Mr R K Banyal SirCSE Department

COMPUTER SCIENCE AND ENGINEERING DEPARTMENT

Page 2: CASSANDRA - Next to RDBMS

Contents

• What is Cassandra ?• History• Data Model• System architecture• Key features and benefits• Who is using Cassandra ?• Conclusion and future scope

Contents

Page 3: CASSANDRA - Next to RDBMS

Apache Cassandra™ is a free

DistributedHigh performanceExtremely scalableFault tolerant(i.e. no single point of failure..)open source NoSQL database.

Definition of Cassandra

Page 4: CASSANDRA - Next to RDBMS

Big Table Dynamo

The history of Cassandra

Page 5: CASSANDRA - Next to RDBMS

• Table is a multi dimensional map indexed by key (row key).

• Columns are grouped into Column Families.• 2 Types of Column Families– Simple– Super (nested Column Families)

• Each Column has– Name– Value– Timestamp

Data Model

Page 6: CASSANDRA - Next to RDBMS

Data Model

Page 7: CASSANDRA - Next to RDBMS

• PartitioningHow data is partitioned across nodes

• ReplicationHow data is duplicated across nodes

System Architecture

Page 8: CASSANDRA - Next to RDBMS

• Nodes are logically structured in Ring Topology.

• Hashed value of key associated with data partition is used to assign it to a node in the ring.

• Hashing rounds off after certain value to support ring structure.

• Lightly loaded nodes moves position to alleviate highly loaded nodes.

Partitioning

Page 9: CASSANDRA - Next to RDBMS

• Each data item is replicated at N (replication factor) nodes.

• Different Replication Policies– Rack Unaware – replicate data at N-1 successive

nodes after its coordinator– Rack Aware – uses ‘Zookeeper’ to choose a leader

which tells nodes the range they are replicas for– Datacenter Aware – similar to Rack Aware but leader

is chosen at Datacenter level instead of Rack level.

Replication

Page 10: CASSANDRA - Next to RDBMS

Replication

Page 11: CASSANDRA - Next to RDBMS

Gossip Protocol

• Network Communication protocols inspired for real life rumor spreading.

• Periodic, Pairwise, inter-node communication.• Low frequency communication ensures low cost.• Random selection of peers.• Example – Node A wish to search for pattern in data

– Round 1 – Node A searches locally and then gossips with node B.

– Round 2 – Node A,B gossips with C and D.– Round 3 – Nodes A,B,C and D gossips with 4 other nodes ……

• Round by round doubling makes protocol very robust.

Page 12: CASSANDRA - Next to RDBMS

Key features & benefits

• Gigabyte to Petabyte scalability• Big data scalability• No single point of failure• Easy Replication / Data distribution• No need for caching software• Flexible Schema

Page 13: CASSANDRA - Next to RDBMS

Big Data Scalability

• Capable of comfortably scaling to petabytes• New nodes = linear performance increases• Add new nodes online

2

1

2

1

4

3

Double throughputcapacity

Page 14: CASSANDRA - Next to RDBMS

No single point of failure

• All nodes are same• Read/write from any node• Can replicate data among different physical data center racks

Page 15: CASSANDRA - Next to RDBMS

Easy Replication

• Transparency handled by Cassandra• Multi data center capable• Exploit all the benefit of cloud computing

Page 16: CASSANDRA - Next to RDBMS

No need for caching layer

• Peer to peer layer removes need for special caching layer and the programming.

• The database use the memory from all the participating nodes to cache the assigned data.

Page 17: CASSANDRA - Next to RDBMS

Flexible Schema

• Dynamic schema design allows for more flexible data storage than rigid RDBMS

• Handles structured, semi-structured and unstructured data.• No offline / downtime for schema changes

Page 18: CASSANDRA - Next to RDBMS

Who uses Cassandra

Page 19: CASSANDRA - Next to RDBMS

Conclusion & future scope

• Cassandra is an open source storage system providing scalability, high performance, and wide applicability.

• Cassandra can support a very high update throughput while delivering low latency.

• Future works involves adding compression, ability to support atomicity across keys and secondary index support.

Page 20: CASSANDRA - Next to RDBMS

Thank You