Cassandra at scale

17
Apache Cassandra at Scale Patrick McFadin | Solution Architect | DataStax Saturday, July 13, 13

description

A 30 minute talk I did at Cassandra Dublin and Cassandra London. Just some things I've learned along the way as I've helped some of the largest users of Cassandra be successful. Learn form other peoples mistakes!

Transcript of Cassandra at scale

Page 1: Cassandra at scale

Apache Cassandra at ScalePatrick McFadin | Solution Architect | DataStax

Saturday, July 13, 13

Page 2: Cassandra at scale

Apache Cassandra at ScalePatrick McFadin | Solution Architect | DataStax

Saturday, July 13, 13

Page 3: Cassandra at scale

Who is this dude?

• Patrick McFadin• Solution Architect at DataStax• Cassandra MVP• User for years• Follow me for more:

I talk about Cassandra and building scalable, resilient apps ALL THE TIME!

@PatrickMcFadin

Saturday, July 13, 13

Page 4: Cassandra at scale

What do you mean “at scale”?

• Personally been involved in ~1000 node deployments• .5 PB total space• Millions of transactions per second• Critical lines of business• Multiple Datacenters

3

Saturday, July 13, 13

Page 5: Cassandra at scale

Time to scale

4

A few tips to help you get there

Saturday, July 13, 13

Page 6: Cassandra at scale

Scaling busters Disk IO

• Cassandra is (almost) never CPU bound• Can your server do this?

5

Disk System Long Sequential Read

• No? You have trouble.• Shared storage (NAS, iSCSI)

- Just no. See above.- IOPS aren’t going to help

Long Sequential Write

At the same time?!!

Saturday, July 13, 13

Page 7: Cassandra at scale

Scaling Busters Spinning disk considerations

• Separate commit and data disks• Tune for reads and writes at the same time.

- Quick test while watching iostat:• Start a long read using dd command• Start a long write using dd command• Did one of them drop to the floor? #fail

• Think about using JBOD instead of RAID.- Each mount point a data dir line listed in config file

6

Saturday, July 13, 13

Page 8: Cassandra at scale

Scaling Busters SSD Considerations

• Scheduler! CFQ is wrong. Use deadline or noop- EX: echo noop > /sys/block/sda/queue/scheduler

• Turn rotational off- EX: echo 0 > /sys/block/sda/queue/rotational

• Read ahead buffers- EX: echo 0 > /sys/block/sda/queue/read_ahead_kb

- Start with 0 (better for random reads)- Walk it up while testing under your load

• Commit and data can coexist• MLC drives, not SLC. Save your money

7

Saturday, July 13, 13

Page 9: Cassandra at scale

Scaling Busters OS Tuning

• Process limits > 10000• Open files > unlimited• Memory and network • Turn swap off• Read this: Recommended production settings

8

http://www.datastax.com/documentation/cassandra/1.2/index.html#cassandra/install/installRecommendSettings.html

Saturday, July 13, 13

Page 10: Cassandra at scale

Scaling busters Horrible use cases

• Relational model projected. - Lots of tables needing a join- Normalized data everywhere- “How can I migrate my RDBMS data to C*”

• Deep and perverse desire for a lock• Using secondary indexes to simulate a RDBMS• Row cache with a lot of small slices

9

Saturday, July 13, 13

Page 11: Cassandra at scale

Great ideas from the real world

• Proper TTLs with reverse comparators• GZIP blob data in column values• Load testing with production data model

- And similar production data!• Engaging experts

10

Saturday, July 13, 13

Page 12: Cassandra at scale

Success Plan Learn Data Modeling

• The Data Model is Dead, Long Live the Data Model• Become a Super Modeler• Next top Data Model

11

My data modeling webinars on Planet Cassandra

Saturday, July 13, 13

Page 13: Cassandra at scale

Success Plan Learn CQL

12

CREATE TABLE username_video_index ( username varchar, videoid uuid, upload_date timestamp, video_name varchar, PRIMARY KEY (username, videoid));

SELECT video_nameFROM username_video_indexWHERE username = ‘ctodd’AND videoid = ‘99051fe9’

Saturday, July 13, 13

Page 14: Cassandra at scale

Success Plan Use DataStax Drivers

• Async IO. (Netty for Java)• Replace multi-get with executeAsync()• Token aware strategy

• Java Driver • C# Driver• Python Driver (soon)

13

Saturday, July 13, 13

Page 15: Cassandra at scale

Success Plan Great online resources!

• Cassandra Summit 2013 SF online now!• Planet Cassandra (www.planetcassandra.org)• IRC #cassandra on irc.freenode.com• Users mailing list

14

Saturday, July 13, 13

Page 16: Cassandra at scale

15

Cassandra Summit Europe 2013

CALL FOR PAPERSSPONSORSHIP 30+ SessionsTWO DAYS TRAINING DAYCALL FOR PAPERS

SPONSORSHIP OPPORTUNITY

TWO DAYS30+ SESSIONS

TRAINING DAY

Saturday, July 13, 13

Page 17: Cassandra at scale

Thank You

Q&A

Saturday, July 13, 13