Cassandra Summit 2014: Deploying Cassandra for Call of Duty
-
Upload
planet-cassandra -
Category
Technology
-
view
740 -
download
0
description
Transcript of Cassandra Summit 2014: Deploying Cassandra for Call of Duty
DEMONWAREDeploying Cassandra for Call of Duty
#CassandraSummit
Tim Czerniak Software Engineer
DemonWare
Seán O Sullivan Operations Engineer
DemonWare
DEMON-WHO?
DemonWare is a subsidiary of Activision-Blizzard
We write, deploy and maintain client and server applications for Activision and Blizzard games
SERVICES• Matchmaking • Leaderboards • Chat • File Storage • Leagues • Social Network
Integration • etc…
TECHNOLOGIES
Client
C++ HTTP
Server
Python Erlang
MySQL CentOS
Puppet
OUR UNUSUAL USE CASE
Release
First weekend
Christmas
Peak
– Benjamin Franklin
“By failing to prepare,you are preparing to fail.”
OUR PREDICAMENT
Needed to share data cross-DC…
…but MySQL isn’t so good at that.
• Progress store • High write, low read. • File size ~4KB • Persistent
• Presence • High write, high read • Data size minimal • Transient
• Messaging • Low write, low read • Transient
SERVICES
• Cross DC
• Ease of consolidation and expansion
• Manageability for the operations teams
• Throughput
• Storage: 1,500,000 reqs/min
• Presence: 250,000 reqs/min
• Messaging: 850,000 reqs/min
REQUIREMENTS
EVALUATION• Shortlisted suitable
options • Riak • Cassandra
• Re-wrote our application backend, twice
LOAD TESTING
• Two clusters
• Single CPU, SSD and average memory
• Dual CPU, Spindles and high memory
• Used realistic user profiles
• Included peaks and troughs during testing
• Ran a soak test
THE WINNER???• Initially Riak was a slam-dunk
• Erlang-based (we know Erlang)
• Tooling is excellent
• Performed well
• Previously evaluated
THE WINNER• Cassandra won in the end
• Write performance
• Richer feature set
• Maturity of codebase and tooling
• Testing continued 24/7 until launch
SCHEMA• Progress store
• A perfect fit! • Presence
• More relational • High throughput (Tombstones!) • TTLs
• Messaging • Time-series data, well suited • Tombstones!
• Keep it simple
• It’s not a relational DB
• Get your partition keys and clustering keys right.
• C* will do what it does best
SCHEMA: LESSONS LEARNED
SCHEMA: LESSONS LEARNED• Don’t ignore CAP theorem
• Cassandra has tuneable consistency, but there will be trade-offs
• Load test with real numbers
• Some issues aren’t evident in unit-tests
CONFIG
• Default settings, probably not what you want
• Changed many settings off the bat
• Reverted some (oops)
HARDWARE
• 2x Intel Xeon E5-2620 @ 2Ghz
• 2x 480GB SSD (RAID-1)
• 32GB
• 1Gb non-dedicated network
MONITORING
• Graphite
• Nagios
• Jolokia
GOTCHAS• Vnodes and rack awareness
• Loadbalancers
• Dev differs from production (of course...)
• Launching in a DC we didn't load test in
LAUNCH
• Request to simulate a node failure
• Two nodes died over Christmas
• Expanding to other titles
QUESTIONS?
APPENDIXcassandra.conf:
auto_bootstrap: false
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
trickle_fsync: true
rpc_server_type: hsha
<% if virtual == "physical" -%>
concurrent_reads: 128
<% else -%>
concurrent_reads: 32
<% end -%>
concurrent_writes: <%= processorcount.to_i * 8 -%>
multithreaded_compaction: false
<% if virtual == "physical" -%>
compaction_throughput_mb_per_sec: 0
<% else -%>
compaction_throughput_mb_per_sec: 16
<% end -%>
!
cassandra-env.sh:
<% if virtual == "physical" -%>
JVM_OPTS="$JVM_OPTS -Xss180k"
<% else -%>
JVM_OPTS="$JVM_OPTS -Xss228k"
<% end -%>
JVM_EXTRA_OPTS="$JVM_EXTRA_OPTS -javaagent:/usr/share/java/graphite-reporter-agent.jar -javaagent:/usr/share/java/jolokia-jvm-agent.jar=port=8080,host=<%= hostname %>"
EXTRA_CLASSPATH="/usr/share/java/metrics-graphite-2.0.3.jar"