Shift: Real World Migration from MongoDB to Cassandra
Click here to load reader
-
Upload
datastax -
Category
Technology
-
view
107 -
download
3
description
Transcript of Shift: Real World Migration from MongoDB to Cassandra
SHIFT.com Migrating from MongoDB to Cassandra by: Blake Eggleston & Jon Haddad
What is SHIFT.com?
Shift is a platform that enables marketers to communicate across organizations and departments in one single place.
It’s also an open application platform with a set of applications built on top of it that can communicate with one another.
Initial Stack
● Python ○ Flask ○ Celery
● MongoDB ○ mongoengine
● Neo4j / Titan ○ Bulbs ○ thunderdome
● Redis ● AWS
○ m1.xlarge for mongo
Current Stack
● Python ○ still flask ○ still celery ○ gevent (it rocks)
● Cassandra ○ 1.2.6 ○ cqlengine
● ElasticSearch ● Redis
○ jondis
● AWS ○ m1.xlarge
Why did we move to Cassandra?
● Operational Benefits ○ Adding and removing nodes is much easier,
compared to Mongo’s shards
● Control over our Data on Disk (LSMT) ● Love CQL3 ● Long term scalability
○ Scales Linearly ○ Multi DC Support Baked in
Migration Goals
● Zero downtime ○ We wanted to roll out Cassandra without any
service interruptions
● No loss of performance ○ By carefully structuring our schema we were able
to match MongoDB’s performance.
Migration Strategy
Benefits of CQL3
● Easy to understand if you’re coming from RDBMS
● Collections ○ sets, lists, maps
● Batch Queries ● Clustering Keys
○ Handles ordering of logical rows ○ Saved us from column name management scheme
and allowed us to focus on our data
Physical vs Logical Row
Single Row
Clustered Row
Data Modelling Patterns
● considerations: working with Mongo’s dbrefs and optimizing layout on disk
● structured tables as materialized views of the queries we planned on using
● moving multiple documents into a single physical row
● creating supporting index tables for looking up logical rows
Time Series: Message Stream
● Users have tens of thousands of messages ● Each users message stream is specific to
them, like a twitter feed ● This is Cassandra’s strength - Time Series ● Considered Redis - but poor for multi-dc
create table news_feed ( user_id uuid,
message_id timeuuid,
message,
primary key (user_id, message_id));
cqlengine
● cqlengine.org ● the Python CQL3 object-row mapper ● exposes CQL3 tables as Python classes ● maps columns to properties ● builds CQL queries
#model definition class ExampleModel(Model): example_id = columns.UUID(primary_key=True) example_type = columns.Integer(index=True) created_at = columns.DateTime() description = columns.Text(required=False) # example query ExampleModel.objects(example_type=1)
Improvements from moving to C*
● Operationally we’ve had zero problems ● Outstanding Performance ● Easy to build new features ● Community has been amazing (mailing list
and #cassandra)
misc tips
● leveled compaction - good for read heavy workloads
● use secondary indexes sparingly, understand how they work and when to use them
● to reiterate, think about how you’re going to query your data
● use elastic search / solr for ad hoc queries
Contact Info
Jon Haddad @rustyrazorblade [email protected]
Blake Eggleston @blakeeggleston [email protected]
….we’re hiring!