Introduction to Data Modeling in Cassandra

22

Click here to load reader

Transcript of Introduction to Data Modeling in Cassandra

Page 1: Introduction to Data Modeling in Cassandra

Introduction to Data Modeling in Cassandra

BarCamp Kerala 2015

Page 2: Introduction to Data Modeling in Cassandra

Who am I?

Software Engineer at RapidValue Backend Engineer of Gudly Author of Flask-CQLAlchemy

Page 3: Introduction to Data Modeling in Cassandra

What is Cassandra?

Massively linearly scalable NoSQL database High throughput with nearly linear scaling with proper use

cases Row-column oriented with SQL like approach using CQL

Page 4: Introduction to Data Modeling in Cassandra

Brief History

Created by Avinash Lakshman(creator of Amazon's Dynamo) and Prashant Malik

Released as open source in 2008 Became an Apache top-level project in 2010

Page 5: Introduction to Data Modeling in Cassandra

Best Use-Cases

Playlists & Collections Sensor Data Personalization and recommendation engines Messaging Fraud Detection

Page 6: Introduction to Data Modeling in Cassandra

Notable features

No single point of failure Clearly defined table schema in a NoSQL environments Near linear horizontal scaling across commodity servers No joins

Page 7: Introduction to Data Modeling in Cassandra

Brewer's Conjecturea.k.a “CAP Theorem”

Consistency – All nodes see the same data at any given time Availability – Every request receives a response whether is

succeeded or failed Partition Tolerance – Failure of a node does not bring the

system down Cassandra is a AP database

Page 8: Introduction to Data Modeling in Cassandra

RDBMS vs CassandraQuerying

SQL for querying

SELECT * FROM users WHERE name = “John Doe”;

CQL for querying

SELECT * FROM users WHERE name = “John Doe”;

Page 9: Introduction to Data Modeling in Cassandra

Data Modeling

Collection and analysis of data requirements Identification of participating entities and relationships Identification of data access patterns

Page 10: Introduction to Data Modeling in Cassandra

Data Modeling

A particular way of organizing and structuring data Design and specification of a database schema Schema optimization and data indexing techniques

Page 11: Introduction to Data Modeling in Cassandra

Products of Data Modeling

Conceptual Data model

Technology independent, unified views of data Entity-relationship model, dimensional model etc.

Page 12: Introduction to Data Modeling in Cassandra

Conceptual Data Model Entity Relationship Diagram

Page 13: Introduction to Data Modeling in Cassandra

Products of Data Modeling

Logical Data model

Unique for Cassandra Column family diagrams (Chebotko diagrams)

Page 14: Introduction to Data Modeling in Cassandra

Modeling Guidelines Writes are cheap, reads are not Joins are not possible Duplication is good Indexing creates latency All data required to answer a query must be nested in a

column family

Page 15: Introduction to Data Modeling in Cassandra

Data Modeling Methodology For each query, Identify a subset of the conceptual data model that describes

query data Apply a suitable mapping pattern on the subset and the

query Use Chebotko diagram to describe this as a logical model

Page 16: Introduction to Data Modeling in Cassandra

Products of Data Modeling

Physical Data model

Unique for Cassandra CQL Definitions

Page 17: Introduction to Data Modeling in Cassandra

Physical Data Model

CQL CREATE statement CREATE TABLE emp ( empID int, deptID int, first_name varchar, last_name varchar, PRIMARY KEY (empID, deptID) );

Page 18: Introduction to Data Modeling in Cassandra

RDBMS vs Cassandra

Cassandra is equally good for complex and simple data All data required to answer a query must be nested in a

column family Data modeling methodology is driven by queries and data Data duplication is considered normal

Page 19: Introduction to Data Modeling in Cassandra

Cassandra in Production Netflix Spotify Twitter

Page 20: Introduction to Data Modeling in Cassandra
Page 21: Introduction to Data Modeling in Cassandra

References http://academy.datastax.com http://www.slideshare.net/nkorla1share/cass-summit-3 http://docs.datastax.com http://planetcassandra.com

Page 22: Introduction to Data Modeling in Cassandra

Contact

Email: [email protected]

Twitter: @_thegeorgeous

Github: http://github.com/thegeorgeous