Cassandra at scale

Apache Cassandra at ScalePatrick McFadin | Solution Architect | DataStax

Saturday, July 13, 13

Who is this dude?

• Patrick McFadin• Solution Architect at DataStax• Cassandra MVP• User for years• Follow me for more:

I talk about Cassandra and building scalable, resilient apps ALL THE TIME!

@PatrickMcFadin


What do you mean “at scale”?

• Personally been involved in ~1000 node deployments• .5 PB total space• Millions of transactions per second• Critical lines of business• Multiple Datacenters

3


Time to scale

4

A few tips to help you get there


Scaling busters Disk IO

• Cassandra is (almost) never CPU bound• Can your server do this?

5

Disk System Long Sequential Read

• No? You have trouble.• Shared storage (NAS, iSCSI)

- Just no. See above.- IOPS aren’t going to help

Long Sequential Write

At the same time?!!


Scaling Busters Spinning disk considerations

• Separate commit and data disks• Tune for reads and writes at the same time.

- Quick test while watching iostat:• Start a long read using dd command• Start a long write using dd command• Did one of them drop to the floor? #fail

• Think about using JBOD instead of RAID.- Each mount point a data dir line listed in config file

6


Scaling Busters SSD Considerations

• Scheduler! CFQ is wrong. Use deadline or noop- EX: echo noop > /sys/block/sda/queue/scheduler

• Turn rotational off- EX: echo 0 > /sys/block/sda/queue/rotational

• Read ahead buffers- EX: echo 0 > /sys/block/sda/queue/read_ahead_kb

- Start with 0 (better for random reads)- Walk it up while testing under your load

• Commit and data can coexist• MLC drives, not SLC. Save your money

7


Scaling Busters OS Tuning

• Process limits > 10000• Open files > unlimited• Memory and network • Turn swap off• Read this: Recommended production settings

8

http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/install/installRecommendSettings.html


http://www.datastax.com/documentation/cassandra/1.2/index.html#

http://www.datastax.com/documentation/cassandra/1.2/index.html#

Scaling busters Horrible use cases

• Relational model projected. - Lots of tables needing a join- Normalized data everywhere- “How can I migrate my RDBMS data to C*”

• Deep and perverse desire for a lock• Using secondary indexes to simulate a RDBMS• Row cache with a lot of small slices

9


Great ideas from the real world

• Proper TTLs with reverse comparators• GZIP blob data in column values• Load testing with production data model

- And similar production data!• Engaging experts

10


Success Plan Learn Data Modeling

• The Data Model is Dead, Long Live the Data Model• Become a Super Modeler• Next top Data Model

11

My data modeling webinars on Planet Cassandra


Success Plan Learn CQL

12

CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid));

SELECT video_nameFROM username_video_indexWHERE username = ‘ctodd’AND videoid = ‘99051fe9’


Success Plan Use DataStax Drivers

• Async IO. (Netty for Java)• Replace multi-get with executeAsync()• Token aware strategy

• Java Driver • C# Driver• Python Driver (soon)

13


Success Plan Great online resources!

• Cassandra Summit 2013 SF online now!• Planet Cassandra (www.planetcassandra.org)• IRC #cassandra on irc.freenode.com• Users mailing list

14


http://www.planetcassandra.org

http://www.planetcassandra.org

15

Cassandra Summit Europe 2013

CALL FOR PAPERSSPONSORSHIP 30+ SessionsTWO DAYS TRAINING DAYCALL FOR PAPERS

SPONSORSHIP OPPORTUNITY

TWO DAYS30+ SESSIONS

TRAINING DAY


Thank You

Q&A


Cassandra at scale

Technology

Transcript of Cassandra at scale