Introduction to Data Modeling in Cassandra

BarCamp Kerala 2015

Who am I?

Software Engineer at RapidValue Backend Engineer of Gudly Author of Flask-CQLAlchemy

What is Cassandra?

Massively linearly scalable NoSQL database High throughput with nearly linear scaling with proper use

cases Row-column oriented with SQL like approach using CQL

Brief History

Created by Avinash Lakshman(creator of Amazon's Dynamo) and Prashant Malik

Released as open source in 2008 Became an Apache top-level project in 2010

Best Use-Cases

Playlists & Collections Sensor Data Personalization and recommendation engines Messaging Fraud Detection

Notable features

No single point of failure Clearly defined table schema in a NoSQL environments Near linear horizontal scaling across commodity servers No joins

Brewer's Conjecturea.k.a “CAP Theorem”

Consistency – All nodes see the same data at any given time Availability – Every request receives a response whether is

succeeded or failed Partition Tolerance – Failure of a node does not bring the

system down Cassandra is a AP database

RDBMS vs CassandraQuerying

SQL for querying

SELECT * FROM users WHERE name = “John Doe”;

CQL for querying

SELECT * FROM users WHERE name = “John Doe”;

Data Modeling

Collection and analysis of data requirements Identification of participating entities and relationships Identification of data access patterns

Data Modeling

A particular way of organizing and structuring data Design and specification of a database schema Schema optimization and data indexing techniques

Products of Data Modeling

Conceptual Data model

Technology independent, unified views of data Entity-relationship model, dimensional model etc.

Conceptual Data Model Entity Relationship Diagram

Logical Data model

Unique for Cassandra Column family diagrams (Chebotko diagrams)

Modeling Guidelines Writes are cheap, reads are not Joins are not possible Duplication is good Indexing creates latency All data required to answer a query must be nested in a

column family

Data Modeling Methodology For each query, Identify a subset of the conceptual data model that describes

query data Apply a suitable mapping pattern on the subset and the

query Use Chebotko diagram to describe this as a logical model

Physical Data model

Unique for Cassandra CQL Definitions

Physical Data Model

CQL CREATE statement CREATE TABLE emp ( empID int, deptID int, first_name varchar, last_name varchar, PRIMARY KEY (empID, deptID) );

RDBMS vs Cassandra

Cassandra is equally good for complex and simple data All data required to answer a query must be nested in a

column family Data modeling methodology is driven by queries and data Data duplication is considered normal

Cassandra in Production Netflix Spotify Twitter

References http://academy.datastax.com http://www.slideshare.net/nkorla1share/cass-summit-3 http://docs.datastax.com http://planetcassandra.com

Contact

Email: iamgeorgethomas@gmail.com

Twitter: @_thegeorgeous

Github: http://github.com/thegeorgeous

Introduction to Data Modeling in Cassandra

Data & Analytics

Transcript of Introduction to Data Modeling in Cassandra

Cassandra 3.0 Data Modeling

Cassandra introduction @ NantesJUG

Introduction to CQL and Data Modeling with Apache Cassandra

«Cassandra data modeling – моделирование данных для NoSQL СУБД Cassandra»

Cassandra Training Introduction & Data Modeling. 2 Aims Introduction to Cassandra By the end of today you should know: How Cassandra organises data How.

Cassandra advanced data modeling

Advanced data modeling with apache cassandra

Cassandra introduction @ ParisJUG

Cassandra Data Modeling - Practical Considerations @ Netflix

Cassandra data modeling talk

Data Modeling for Apache Cassandra

Introduction to Dating Modeling for Cassandra

Cassandra Deep Diver & Data Modeling

Introduction to Cassandra • Why Spark - Apache Cassandra | Apache Kafka | Apache Spark · 2017. 12. 20. · • Introduction to Cassandra • Why Spark + Cassandra • Problem background

Cassandra Day Chicago 2015: Apache Cassandra Data Modeling 101

Cassandra Basics, Counters and Time Series Modeling

Cassandra, Modeling and Availability at AMUG

DZone Cassandra Data Modeling Webinar

Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling

Cassandra introduction 2016