Cassandra Day Denver 2014: So, You Want to Use Cassandra?

17

description

This talk discusses things to consider when considering Cassandra through the purview of a Pearson’s team’s recent Cassandra adoption after coming from a .NET/SQL world. Topics covered include data model design, operationalization of a cluster, and other best practices along with what happens when they aren’t followed.

Transcript of Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Page 1: Cassandra Day Denver 2014: So, You Want to Use Cassandra?
Page 2: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Introduction

Page 3: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

So You Want To Use Cassandra?

Lessons Learned Implementing Cassandra at Pearson

Page 4: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Data Model

Page 5: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Data Modeling

● Know not only your data, but how you plan to retrieve it

● Can Cassandra store it in an easily retrievable manner?

● Will the data scale well and not break Cassandra

Page 6: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

About Your Data...

● Data partitioning strategy● Know how you need to search your data● Limit the number of updates and deletes on

data that must be indexed● Denormalize ALL THE THINGS

Page 7: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Things C* Does Well

● Non-relational Data● Permanent Data● Storing Data as it should be viewed

Page 8: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Things C* Does NOT Do Well

● Constructible Views Across Data● Queue-like Data Patterns● Highly Volatile Indexed Data

Page 9: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Searching Your Data

● Do not rely on a single column family to handle all lookups

● Single set of data can have multiple column families depending on how many ways you need to look up the data

● Avoid secondary indexes in almost all use cases

Page 10: Cassandra Day Denver 2014: So, You Want to Use Cassandra?
Page 11: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Searching Your Data (continued)

● Avoid indexing volatile data● Limit your lookups to single partitions where

possible

Page 12: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Tombstones

How to Kill Your Cassandra Service

Page 13: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Tombstones

● Cassandra’s mechanism for handling deletes in a distributed fashion

● Created whenever a row or column is deleted or an indexed value is updated

● Essentially timestamped soft deletes

● Can cause your lookups to fail inexplicably when too many are read (100,000)

Page 14: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Managing Tombstones

● Avoid data models that:○ Update indexed columns○ Have too many deletes○ Need to query data across partitions

● Try to make your data as immutable as possible

● Fine tune your garbage collection settings

Page 15: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Operationalization

Maintaining a C* Cluster

Page 16: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Operationalization

● Cassandra requires more maintenance than most RDBMS

● Strange, difficult to debug issues will arise when your cluster is neglected

● Need to perform maintenance jobs regularly to keep cluster healthy and consistent

● Possibly perform major compactions to help keep reads performant

Page 17: Cassandra Day Denver 2014: So, You Want to Use Cassandra?

Thank You