Cassandra Community Webinar - Introduction To Apache Cassandra 1.2
Cassandra Day Denver 2014: So, You Want to Use Cassandra?
-
Upload
planet-cassandra -
Category
Technology
-
view
1.204 -
download
1
description
Transcript of Cassandra Day Denver 2014: So, You Want to Use Cassandra?
Introduction
So You Want To Use Cassandra?
Lessons Learned Implementing Cassandra at Pearson
Data Model
Data Modeling
● Know not only your data, but how you plan to retrieve it
● Can Cassandra store it in an easily retrievable manner?
● Will the data scale well and not break Cassandra
About Your Data...
● Data partitioning strategy● Know how you need to search your data● Limit the number of updates and deletes on
data that must be indexed● Denormalize ALL THE THINGS
Things C* Does Well
● Non-relational Data● Permanent Data● Storing Data as it should be viewed
Things C* Does NOT Do Well
● Constructible Views Across Data● Queue-like Data Patterns● Highly Volatile Indexed Data
Searching Your Data
● Do not rely on a single column family to handle all lookups
● Single set of data can have multiple column families depending on how many ways you need to look up the data
● Avoid secondary indexes in almost all use cases
Searching Your Data (continued)
● Avoid indexing volatile data● Limit your lookups to single partitions where
possible
Tombstones
How to Kill Your Cassandra Service
Tombstones
● Cassandra’s mechanism for handling deletes in a distributed fashion
● Created whenever a row or column is deleted or an indexed value is updated
● Essentially timestamped soft deletes
● Can cause your lookups to fail inexplicably when too many are read (100,000)
Managing Tombstones
● Avoid data models that:○ Update indexed columns○ Have too many deletes○ Need to query data across partitions
● Try to make your data as immutable as possible
● Fine tune your garbage collection settings
Operationalization
Maintaining a C* Cluster
Operationalization
● Cassandra requires more maintenance than most RDBMS
● Strange, difficult to debug issues will arise when your cluster is neglected
● Need to perform maintenance jobs regularly to keep cluster healthy and consistent
● Possibly perform major compactions to help keep reads performant
Thank You