C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

36
Patrick McFadin | Chief Evangelist DataStax @PatrickMcFadin Data Model on Fire #CASSANDRAEU Friday, October 18, 13

description

Speaker: Patrick McFadin, Chief Evangelist at DataStax Video: http://www.youtube.com/watch?v=oUEKMcTsbfU&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=22 Functional data models are great, but how can you squeeze out more performance and make them awesome! Let's talk through some example Cassandra 2.0 models, go through the tuning steps and understand the tradeoffs. Many time's just a simple understanding of the underlying Cassandra 2.0 internals can make all the difference. I've helped some of the biggest companies in the world do this and I can help you. Do you feel the need for Cassandra 2.0 speed?

Transcript of C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

Page 1: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

Patrick McFadin | Chief Evangelist DataStax@PatrickMcFadin

Data Model on Fire

#CASSANDRAEU

Friday, October 18, 13

Page 2: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUData Model is King•With 2.0 we now have more choices•Sometimes the data model is only the first part

•Understanding the underlying engine helps

•You aren’t done until you tune

Load test baby!

Friday, October 18, 13

Page 3: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

Light Weight Transactions

Friday, October 18, 13

Page 4: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUThe race is onProcess 1 Process 2

SELECT firstName, lastNameFROM usersWHERE username = 'pmcfadin';

SELECT firstName, lastNameFROM usersWHERE username = 'pmcfadin';

(0 rows)

(0 rows)

INSERT INTO users (username, firstname, lastname, email, password, created_date)VALUES ('pmcfadin','Patrick','McFadin', ['[email protected]'], 'ba27e03fd95e507daf2937c937d499ab', '2011-06-20 13:50:00');

INSERT INTO users (username, firstname, lastname, email, password, created_date)VALUES ('pmcfadin','Paul','McFadin', ['[email protected]'], 'ea24e13ad95a209ded8912e937d499de', '2011-06-20 13:51:00');

T0

T1

T2

T3

Got nothing! Good to go!

This one wins

Friday, October 18, 13

Page 5: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUSolution LWTProcess 1

INSERT INTO users (username, firstname, lastname, email, password, created_date)VALUES ('pmcfadin','Patrick','McFadin', ['[email protected]'], 'ba27e03fd95e507daf2937c937d499ab', '2011-06-20 13:50:00')IF NOT EXISTS;

T0

T1 [applied]----------- True

•Check performed for record•Paxos ensures exclusive access

•applied = true: Success

Friday, October 18, 13

Page 6: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUSolution LWTProcess 2

T2

T3

[applied] | username | created_date | firstname | lastname -----------+----------+--------------------------+-----------+---------- False | pmcfadin | 2011-06-20 13:50:00-0700 | Patrick | McFadin

INSERT INTO users (username, firstname, lastname, email, password, created_date)VALUES ('pmcfadin','Paul','McFadin', ['[email protected]'], 'ea24e13ad95a209ded8912e937d499de', '2011-06-20 13:51:00')IF NOT EXISTS;

•applied = false: Rejected•No record stomping!

Friday, October 18, 13

Page 7: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEULWT Fine Print•Light Weight Transactions solve edge conditions•They have latency cost.

•Be aware

•Load test

•Consider in your data model

•Now go shut down that ZooKeeper mess you have!

Friday, October 18, 13

Page 8: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

Form Versioning: Revisited

Friday, October 18, 13

Page 9: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUForm Versioning Pt 1•From “Next top data model”•Great idea, but edge conditions

CREATE TABLE working_version (! username varchar,! form_id int,! version_number int,! locked_by varchar,! form_attributes map<varchar,varchar> ! PRIMARY KEY ((username, form_id), version_number)) WITH CLUSTERING ORDER BY (version_number DESC);

•Each user has a form•Each form needs versioning

•Need an exclusive lock on the form

Friday, October 18, 13

Page 10: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUForm Versioning Pt 1

INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,1,'',{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<radio>':'Y,N'});

UPDATE working_version SET locked_by = 'pmcfadin'WHERE username = 'pmcfadin'AND form_id = 1138AND version_number = 1;

INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,2,null,{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<checkbox>':'Y'});

1. Insert first version

2. Lock for one user

3. Insert new version. Release lock

Danger Zone

Friday, October 18, 13

Page 11: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUForm Versioning Pt 2

INSERT INTO working_version (username, form_id, version_number, locked_by, form_attributes)VALUES ('pmcfadin',1138,1,'pmcfadin',{'FirstName<text>':'First Name: ','LastName<text>':'Last Name: ','EmailAddress<text>':'Email Address: ','Newsletter<radio>':'Y,N'})IF NOT EXISTS;

UPDATE working_version SET form_attributes['EmailAddress<text>'] = 'Primary Email Address: 'WHERE username = 'pmcfadin'AND form_id = 1138AND version_number = 1IF locked_by = 'pmcfadin';

UPDATE working_version SET form_attributes['EmailAddress<text>'] = 'Email Adx: 'WHERE username = 'pmcfadin'AND form_id = 1138AND version_number = 1IF locked_by = 'dude';

1. Insert first version

Exclusive lock

Accepted

Rejected(sorry dude)

Friday, October 18, 13

Page 12: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUForm Versioning Pt 2•Old way: Edge cases with problems

•Use external locking?

•Take your chances?

•New way: Managed expectations (LWT)•Exclusive by existence check

•Continued with IF clause

•Downside: More latency

Friday, October 18, 13

Page 13: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

Fire: Bring it

Friday, October 18, 13

Page 14: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUCassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf•Three big changes in 2.0 I like

Friday, October 18, 13

Page 15: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUCassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf•Three big changes in 2.0 I like

Single pass compaction

Friday, October 18, 13

Page 16: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUCassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf•Three big changes in 2.0 I like

Single pass compaction

Hints to reduce SSTable reads

Friday, October 18, 13

Page 17: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUCassandra 2.0 Fire•Great changes in both 1.2 and 2.0 for perf•Three big changes in 2.0 I like

Single pass compaction

Hints to reduce SSTable reads

Faster index reads from off-heap

Friday, October 18, 13

Page 18: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUWhy is this important?•Reducing SStable reads mean less seeks•Disk seeks can add up fast

•5 seeks on SATA = 60ms of just disk!

Avg Access Time* Rotation Speed

12ms 7200 RPM

7ms 10k RPM

5ms 15k RPM

.04ms SSD

* Source: www.tomshardware.com

Friday, October 18, 13

Page 19: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUWhy is this important?•Reducing SStable reads mean less seeks•Disk seeks can add up fast

•5 seeks on SATA = 60ms of just disk!

Avg Access Time* Rotation Speed

12ms 7200 RPM

7ms 10k RPM

5ms 15k RPM

.04ms SSD

* Source: www.tomshardware.com

Shared storage == Great sadness

Friday, October 18, 13

Page 20: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUQuick Diversion•cfhistograms is your friend•Histograms of statistics per table

•Collected...•per read

•per write

•SSTable flush

•Compaction

nodetool cfhistograms <keyspace> <table>

Friday, October 18, 13

Page 21: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEU

How do I even read this thing!

Friday, October 18, 13

Page 22: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUHistograms How to

nodetool cfhistograms videodb users

videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 0 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0

•Unit-less column•Units are assigned by each column

•Numerical buckets

Friday, October 18, 13

Page 23: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUHistograms How to

nodetool cfhistograms videodb users

videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0

•Per read. How many seeks?•Offset is number of SSTables read

•Less == lower read latency

•107 reads took 1 seek to satisfy

Friday, October 18, 13

Page 24: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUHistograms How to

nodetool cfhistograms videodb users

videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0

•Per write. How fast?•Offset is microseconds

Friday, October 18, 13

Page 25: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUHistograms How to

nodetool cfhistograms videodb users

videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0

•Per read. How fast?•Offset is microseconds

Friday, October 18, 13

Page 26: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUHistograms How to

nodetool cfhistograms videodb users

videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0

•Per partition (storage row)•Offset is size in bytes

•5 partitions are 1250 bytes

Friday, October 18, 13

Page 27: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUHistograms How to

•Per partition (storage row)•Offset is count of cells in partition

•5 partitions have 10 cells

nodetool cfhistograms videodb users

videodb/users histogramsOffset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 107 0 0 0 02 2 0 0 0 010 0 0 0 0 5250 0 5 0 0 0800 0 10 50 0 01250 0 0 300 5 0

Friday, October 18, 13

Page 28: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUHistograms + Data Model•Your data model is the key to success•How do you ensure that?

Test

Measure

Repeat

Friday, October 18, 13

Page 29: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUReal World Example•Real Customer•Needed very tight SLA on reads

•Read response highly variable•Loading data increases latency

Problem

Friday, October 18, 13

Page 30: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEU

Offset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 2016550 0 0 0 02 2064495 0 0 0 03 434526 0 0 0 04 51084 0 0 0 05 0 0 0 0 06 0 0 0 0 07 0 0 0 0 08 0 0 0 0 010 0 0 0 0 162912 0 0 0 0 297114 0 0 0 0 128617 0 0 0 0 6820 0 0 0 0 18824 0 0 0 0 10129 0 0 0 0 5079935 0 0 0 0 26942 0 0 0 0 13241450 0 0 0 0 3294360 0 0 0 0 6209972 0 0 0 0 11685586 0 0 0 0 41562103 0 0 0 0 42796124 0 0 0 0 46719149 0 0 0 0 57693179 0 0 3 0 27659215 0 0 18 0 26941258 0 0 47 0 21589310 0 0 71 0 19494372 0 0 141 0 8681446 0 0 67 0 9499535 0 0 36466 1629 9360642 0 0 263829 0 4349770 0 0 608488 2971 4242924 0 0 209549 1468 24221109 0 0 398845 59 16851331 0 0 625099 45105 9541597 0 0 462636 5731 6101916 0 0 499920 132391 3662299 0 0 380787 16265 3032759 0 0 285323 20015 1883311 0 0 202417 30980 1063973 0 0 148920 44973 644768 0 0 106452 38502 555722 0 0 81533 69479 236866 0 0 55470 39218 158239 0 0 43512 23027 39887 0 0 30810 58498 211864 0 0 22375 73629 014237 0 0 15148 33444 117084 0 0 12047 28321 020501 0 0 11298 17021 024601 0 0 9652 13072 329521 0 0 6715 7790 035425 0 0 13788 7764 042510 0 0 15322 5890 051012 0 0 8585 4046 061214 0 0 5041 2973 073457 0 0 2892 1954 088148 0 0 1543 936 0105778 0 0 900 661 0126934 0 0 486 409 0152321 0 0 285 289 0182785 0 0 124 178 0219342 0 0 35 126 0263210 0 0 8 76 0315852 0 0 0 68 0

• Compactions behind

• Disk IO problems

• How to optimize?

Friday, October 18, 13

Page 31: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEU

Offset SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes)1 2045656 0 0 0 02 1813961 0 0 0 03 70496 0 0 0 04 0 0 0 0 05 0 0 0 0 06 0 0 0 0 07 0 0 0 0 08 0 0 0 0 010 0 0 0 0 4712 0 0 0 0 86014 0 0 0 0 39317 0 0 0 0 5020 0 0 0 0 024 0 0 0 0 2129 0 0 0 0 3448935 0 0 0 0 3242 0 0 0 0 9722650 0 0 0 0 2449060 0 0 0 0 4707772 0 0 0 0 9476186 0 0 0 0 32559103 0 0 0 0 33885124 0 0 0 0 37051149 0 0 1 0 48429179 0 0 17 0 23272215 0 0 95 0 22459258 0 0 84 0 17953310 0 0 174 0 16178372 0 0 53082 0 7123446 0 0 318074 0 7836535 0 0 423140 47 7904642 0 0 382926 0 3552770 0 0 365670 860 3525924 0 0 414824 392 19981109 0 0 442701 46 14111331 0 0 335862 30325 7571597 0 0 302920 4082 5181916 0 0 236448 97224 2942299 0 0 171726 11843 2542759 0 0 122880 15160 1623311 0 0 90413 23484 893973 0 0 66682 34799 624768 0 0 53385 29619 545722 0 0 39121 53155 236866 0 0 26828 30702 128239 0 0 18930 18627 39887 0 0 12517 47739 211864 0 0 8269 61853 014237 0 0 6049 28875 117084 0 0 4614 24391 020501 0 0 5868 14450 024601 0 0 6167 11112 029521 0 0 2879 6609 035425 0 0 2054 6654 042510 0 0 8913 4986 051012 0 0 4429 3352 061214 0 0 1541 2465 073457 0 0 560 1607 088148 0 0 192 809 0105778 0 0 59 523 0126934 0 0 19 333 0152321 0 0 0 262 0

2 ms!

Lessseeks

• Tuned data disk

• Compactions better

• 1 less seek overall

• Further tuning made it even better!

What about the partition size?

Friday, October 18, 13

Page 32: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUPartition Size•Tuning is an option based on size in bytes•All about the reads

•index_interval•How many samples taken

•Lower for faster access but more memory usage

•column_index_size_in_kb•Add column indexes to a row when the data reaches this size

•Partial row reads? Maybe smaller.

Friday, October 18, 13

Page 33: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUTuning results•Spent a lot of time tuning disk•Played with

•index_interval (Lowered)

•concurrent_reads (Increased)

•column_index_size_in_kb (Lowered)

220 Million Ops/Day

10000 Transactions/Sec Peak

9ms at 95th percentile. Measured at the application!

Friday, October 18, 13

Page 34: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

Offset SSTables Write Latency Read Latency Row Size Column Count1 27425403 0 0 0 02 0 0 0 0 03 0 0 0 0 04 0 0 1 0 05 0 0 24 0 06 0 0 56 0 07 0 0 92 0 08 0 0 283 0 010 0 0 2834 0 012 0 0 11954 0 014 0 0 32621 0 121834517 0 0 135311 0 020 0 0 314195 0 024 0 0 610665 0 029 0 0 536736 0 035 0 0 162541 0 042 0 0 25277 0 050 0 0 7847 0 060 0 0 5864 0 072 0 0 9580 0 086 0 0 5517 0 0103 0 0 3822 0 0124 0 0 1850 0 0149 0 0 394 0 0179 0 0 253 0 0215 0 0 305 0 0258 0 0 4657297 0 0310 0 0 12748409 0 0372 0 0 7475534 0 0446 0 0 263549 0 0535 0 0 217171 0 0642 0 0 41908 1218345 0770 0 0 24876 0 0924 0 0 13566 0 01109 0 0 10875 0 01331 0 0 9379 0 01597 0 0 7111 0 01916 0 0 5333 0 02299 0 0 5072 0 02759 0 0 3987 0 03311 0 0 5290 0 03973 0 0 5169 0 04768 0 0 2867 0 05722 0 0 2093 0 06866 0 0 3177 0 08239 0 0 2161 0 09887 0 0 1552 0 011864 0 0 1200 0 014237 0 0 834 0 017084 0 0 1380 0 020501 0 0 6219 0 024601 0 0 4977 0 029521 0 0 2114 0 035425 0 0 6479 0 042510 0 0 18417 0 051012 0 0 5532 0 0

#CASSANDRAEU

• The two hump problem

• Reads awesome until

• Compaction!

• Solution:

• Throttle down compaction

• Tune disk

• Ignore it

Friday, October 18, 13

Page 35: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEUDisk + Data Model•Understand the internals

•Size of partition

•Compaction

•Learn how to measure•Load test

Friday, October 18, 13

Page 36: C* Summit EU 2013: Apache Cassandra 2.0 — Data Model on Fire

#CASSANDRAEU

*More? My data modeling talks:

The Data Model is Dead, Long Live the Data Model

Become a Super Modeler

The World's Next Top Data Model

Thank you! Time for questions...

Friday, October 18, 13