Comparing NoSQL Databases for Operational Workloads
-
Upload
bengber -
Category
Technology
-
view
4.750 -
download
2
description
Transcript of Comparing NoSQL Databases for Operational Workloads
How to Compare NoSQL Databases
Tradeoffs between performance and reliability
Ben EngberThumbtack Technology
Sponsored by
Who are we and what do we want? Consulting company with focus on
scalability Long background of “no SQL”
Production deployments across many NoSQL vendors
Engineering staff of 50 Ongoing research teams
Advise people on which solutions to use
Advertised Features
MongoDB Flexibility
JSON documents Dynamic schema
Power Secondary indexes Dynamic queries Rich updates Easy aggregation
Speed/Scaling Ease of use
Cassandra Elastic scalability Linear performance Flexible, dynamic
schema Multiple datacenter and
cloud readiness Tunable data consistency Basic transaction
support http://www.mongodb.org/about/introduction/ http://www.datastax.com/what-we-offer/products-services/
datastax-enterprise/apache-cassandra
Why use NoSQL at all? “Because I’ve heard of it” “I want rapid application development” “I want to do something with Big Data”
Operational Workload High throughput Multi-user (concurrency) Integrity and consistency Small, simple, operations
Analytic Workload Ad hoc analysis Batch operation on sets Map-Reduce Machine learning, predictive
analytics, etc.
Landscape of Operational NoSQL DBs
What’s the difference between an indexed “value” or “column” and a document?
Couchbase 1.x 2.x
Aerospike 2.x 3.x
What are we really asking?
“I want to support a large transaction volume” “I want to distribute my data tier” “I want simpler handling of failover” “I want to scale my data tier horizontally”
Key-Value Stuff
What about other queries?Shard
s A,B,C
Shards
D,E,F
Shards
G,H,I
Shards J,K,L
So, we’re focusing on scaleHow should we measure operational data?
Test a bunch of databases Start with a nice simple workload
(key value storage) Use a standard client (YCSB) Then move on to
secondary indexes even more databases failover
The Plan – Start Simple
Running a database iseasy – running it correctly is hard Memory sizing, problem sizing, etc. Consistency tradeoffs Eviction Hardware utilization
These databases work in very different ways
CAP Theorem
Consistency / Availability is somewhat academic
Your application needs both HTTP Caches
These databases are tunable
What to think about instead? Consistency
Immediate/Eventual Convergence Isolation
Durability Data loss Failover
Latency Availability (downtime)
Fast Reliable
Most NoSQL databases can sit in multiple places on this spectrum
There is a spectrum of choices
Choose the databases we hear about most often
Create standard baseline scenarios
Measure raw performance for various scenarios
Examine how they fail over
How do databases achieve these guarantees?6 nodes6 “shards” — A, B, C, D, E, FReplication factor of 3
2 Scenarios: “Fast” and “Reliable”
Master-Slave (MySQL, MongoDB)
Node 1
Master: A
Node 2
Slave: A
Node 3
Slave: A
Node 4
Master: B
Node 5
Slave: B
Node 6
Slave: B
Client 1Write master
Read master
Write row A quickly
Client 2Write master and observe
Read master
Write row B durably
Shard Master(Couchbase)
Node 1
Master: ASlave: B,C
Node 2
Master: BSlave: C,D Node 3
Master: CSlave: D,E
Node 6
Master: FSlave: A,B
Node 5
Master: ESlave: F,A
Node 4
Master: DSlave: E,F
ClientWrite master
Read master
Write row A quickly
ClientWrite master and observe
Read master
Write row D durably
Tunable Quorum(Cassandra, Riak)
Node 2
B,C,D
Node 3
C,D,E
Node 4
D,E,F
Node 1
A,B,C
Node 6
F,A,BNode 5
E,F,A
Client 6Write quorumRead quorum
Client 5Write oneRead all
Client 4Write allRead one
Client 2Write oneRead one
Client 1Write oneRead one
Client 3Write oneRead one
Read/Write row A quickly
Read/Write row D consistently
Transactional Consensus(Aerospike, FoundationDB, Cassandra 2.0)
Node 1
A,B,C
Node 2
B,C,D
Node 3
C,D,E
Node 6
F,A,BNode 5
E,F,A
Node 4
D,E,F
Client 2Fire and forget
Client 1Fire and forget
Client 3Fire and forget
Read/Write row A quickly
Client 4Transactional
Client 5Transactional
Read/Write row D ACIDly
Quick and Dirty Conclusion Systems like MongoDB and Couchbase
trade speed for Durability Systems like Cassandra and Riak and
Aerospike trade speed for Consistency Systems like Aerospike and
FoundationDB trade speed for ACID (or parts of it)
Consistency“In distributed data systems like Cassandra, [consistency] usually means that once a writer has written, all readers will see that write.”
Row-level (CAS) Multi-key Long running
transactions
ACIDity
Old Value
New Value
Old Value
Reliability Spectrum
Aerospike (fast)
Cassandra (fast)
Couchbase (fast)
MongoDB (fast)
Aerospike (reliable)
Cassandra (reliable)
MongoDB(reliable)
Replication Model
async async async async sync sync sync
Consistency Model
eventual eventual immediate immediate immediate immediate immediate
Data loss on node failure
yes yes yes yes no no no
Availability on no quorum
available available available available available unavailable unavailable
Data loss on replica set failure
25% 25% 25% 50% 25% 25% 50%
Create Baselines
Fast Reliable Asynchronous
replication Asynchronous writes to
disk Data set fits in RAM Immediate or Eventual
Consistency
Synchronous replication Synchronous or
asynchronous writes to disk
Data set larger than RAM
Immediate Consistency (+)
Performance Tests1. Install a database on a 4-node cluster
(replication factor of 2)
2. Load a sizable dataset (500M rows) to SSD (“reliable”)
3. Determine maximum load
4. Perform a stepwise load for latency
5. Repeat for read-heavy and balanced read-write
6. Repeat steps 3-6 for a dataset that fits into RAM (“fast”)
Balanced Read-Heavy0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
800,000
900,000
1,000,000
AerospikeCassandraMongoDBCouchbase 1.8Couchbase 2.0
Balanced Read-Heavy0
50,000
100,000
150,000
200,000
250,000
300,000
350,000
AerospikeCassandraMongoDB
Maximum Throughput
SSD / Synchronous RAM / Asynchronous
Latency Scenarios
0 50,000 100,000 150,000 200,0000
2.5
5
7.5
10
Balanced Workload Read Latency (Full view)
Aerospike
Cassandra
MongoDB
Throughput, ops/sec
Avera
ge L
ate
ncy,
ms
0 50,000 100,000 150,000 200,0000
4
8
12
16
Balanced Workload Update Latency (Full view)
Aerospike
Cassandra
MongoDB
Throughput, ops/sec
Avera
ge L
ate
ncy,
ms
SSD / Synchronous RAM / Asynchronous
0 100,000 200,000 300,000 400,0000
5
10
15
20
Balanced Workload Read Latency (Full view)
Aerospike
Couchbase 1.8
Couchbase 2.0
Cassandra
MongoDB
Throughput, ops/sec
Avera
ge L
ate
ncy,
ms
0 100,000 200,000 300,000 400,0000
2
4
6
8
Balanced Workload Update Latency (Full view)
Aerospike
Couchbase 1.8
Couchbase 2.0
Cassandra
MongoDB
Throughput, ops/sec
Avera
ge L
ate
ncy,
ms
Cluster Availability
Things to consider: Replication delay Cluster downtime Data loss
Things to test: Graceful shutdown kill -9 Split brain
Cluster4 Nodes100% LoadRAMAsynchronous“Fast”
MongoDB
Aerospike
Couchbase
Aerospike
Couchbase
Cassandra
Cluster4 Nodes75% LoadRAMAsynchronous“Fast”
Aerospike
Cassandra6 node (QUORUM)
Cluster4 Nodes75% LoadDiskSynchronous“Reliable”
Cassandra4 node (ALL)
Aerospike Cassandra Couchbase MongoDB0
2000
4000
6000
8000
10000
12000
14000
Downtimes on node down
min
/max d
ow
nti
me (
ms)
Do they fail over?
50%
of m
ax th
roug
hput
75%
of m
ax th
roug
hput
100%
of m
ax th
roug
hput
0
5000
10000
15000
20000
25000
30000
35000
Downtime on node restore
AerospikeCassandraCouchbaseMongoDB
media
n d
ow
nti
me (
ms)
Takeaways NoSQL databases are converging feature-
wise For operational workloads, key-value is king The Speed – Reliability spectrum is complex Consider your application’s likely needs Pick the best usability features within this
window
Questions or Advice?
Thumbtack Technology Ben [email protected]
http://thumbtack.net/whitepapers
@bengberhttp://www.thumbtack.net
Benchmarks and detailed discussion of methodology at: