Cassandra at scale
-
Upload
patrick-mcfadin -
Category
Technology
-
view
1.719 -
download
6
description
Transcript of Cassandra at scale
Apache Cassandra at ScalePatrick McFadin | Solution Architect | DataStax
Saturday, July 13, 13
Apache Cassandra at ScalePatrick McFadin | Solution Architect | DataStax
Saturday, July 13, 13
Who is this dude?
• Patrick McFadin• Solution Architect at DataStax• Cassandra MVP• User for years• Follow me for more:
I talk about Cassandra and building scalable, resilient apps ALL THE TIME!
@PatrickMcFadin
Saturday, July 13, 13
What do you mean “at scale”?
• Personally been involved in ~1000 node deployments• .5 PB total space• Millions of transactions per second• Critical lines of business• Multiple Datacenters
3
Saturday, July 13, 13
Time to scale
4
A few tips to help you get there
Saturday, July 13, 13
Scaling busters Disk IO
• Cassandra is (almost) never CPU bound• Can your server do this?
5
Disk System Long Sequential Read
• No? You have trouble.• Shared storage (NAS, iSCSI)
- Just no. See above.- IOPS aren’t going to help
Long Sequential Write
At the same time?!!
Saturday, July 13, 13
Scaling Busters Spinning disk considerations
• Separate commit and data disks• Tune for reads and writes at the same time.
- Quick test while watching iostat:• Start a long read using dd command• Start a long write using dd command• Did one of them drop to the floor? #fail
• Think about using JBOD instead of RAID.- Each mount point a data dir line listed in config file
6
Saturday, July 13, 13
Scaling Busters SSD Considerations
• Scheduler! CFQ is wrong. Use deadline or noop- EX: echo noop > /sys/block/sda/queue/scheduler
• Turn rotational off- EX: echo 0 > /sys/block/sda/queue/rotational
• Read ahead buffers- EX: echo 0 > /sys/block/sda/queue/read_ahead_kb
- Start with 0 (better for random reads)- Walk it up while testing under your load
• Commit and data can coexist• MLC drives, not SLC. Save your money
7
Saturday, July 13, 13
Scaling Busters OS Tuning
• Process limits > 10000• Open files > unlimited• Memory and network • Turn swap off• Read this: Recommended production settings
8
http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/install/installRecommendSettings.html
Saturday, July 13, 13
Scaling busters Horrible use cases
• Relational model projected. - Lots of tables needing a join- Normalized data everywhere- “How can I migrate my RDBMS data to C*”
• Deep and perverse desire for a lock• Using secondary indexes to simulate a RDBMS• Row cache with a lot of small slices
9
Saturday, July 13, 13
Great ideas from the real world
• Proper TTLs with reverse comparators• GZIP blob data in column values• Load testing with production data model
- And similar production data!• Engaging experts
10
Saturday, July 13, 13
Success Plan Learn Data Modeling
• The Data Model is Dead, Long Live the Data Model• Become a Super Modeler• Next top Data Model
11
My data modeling webinars on Planet Cassandra
Saturday, July 13, 13
Success Plan Learn CQL
12
CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid));
SELECT video_nameFROM username_video_indexWHERE username = ‘ctodd’AND videoid = ‘99051fe9’
Saturday, July 13, 13
Success Plan Use DataStax Drivers
• Async IO. (Netty for Java)• Replace multi-get with executeAsync()• Token aware strategy
• Java Driver • C# Driver• Python Driver (soon)
13
Saturday, July 13, 13
Success Plan Great online resources!
• Cassandra Summit 2013 SF online now!• Planet Cassandra (www.planetcassandra.org)• IRC #cassandra on irc.freenode.com• Users mailing list
14
Saturday, July 13, 13
15
Cassandra Summit Europe 2013
CALL FOR PAPERSSPONSORSHIP 30+ SessionsTWO DAYS TRAINING DAYCALL FOR PAPERS
SPONSORSHIP OPPORTUNITY
TWO DAYS30+ SESSIONS
TRAINING DAY
Saturday, July 13, 13
Thank You
Q&A
Saturday, July 13, 13