Cassandra from the trenches: migrating Netflix

40
Cassandra from the trenches: migrating Netflix Jason Brown Senior Software Engineer Netflix @jasobrown [email protected] http:// www.linkedin.com /in/ jasedbrown

description

Slide deck on migrating Netflix to Cassandra in EC2 from a legacy, DC-bound relational database.

Transcript of Cassandra from the trenches: migrating Netflix

Page 1: Cassandra from the trenches: migrating Netflix

Cassandra from the trenches:migrating Netflix

Jason BrownSenior Software Engineer

Netflix

@jasobrown [email protected]

http://www.linkedin.com/in/jasedbrown

Page 2: Cassandra from the trenches: migrating Netflix

Your host for the evening

• Sr. Software Engineer at Netflix > 3 years– Currently lead a team developing and operating

AB testing infrastructure in EC2– Spent time migrating core e-commerce

functionality out of PL/SQL and scaling it up• MLB Advanced Media– Ran Ecommerce engineering group

• Wandered about in the wireless space (J2ME, BREW)

Page 3: Cassandra from the trenches: migrating Netflix

History

• In the beginning, there was the webapp– And a database, too– In one datacenter

• Then we grew, and grew, and grew– More databases, all conjoined– Database links with PL/SQL and M views – Multi-Master replication

Page 4: Cassandra from the trenches: migrating Netflix

History,2

• Then it melted down (2008)– Oracle MMR between two databases– SPOF – one Oracle instance for website (no

backup)• Couldn’t ship DVDs for ~3 days

Page 5: Cassandra from the trenches: migrating Netflix

History,3

• Time to rethink everything– Abandon datacenter for EC2• We’re not in the business of building datacenters

– Ditch monolithic webapp for distributed systems• Greater independence for all teams/initiatives

– Migrate SPOF database to …

Page 6: Cassandra from the trenches: migrating Netflix

History,4

• SimpleDb/S3– Somebody else manages your database (yeah!)– Tried it out, but didn’t quite work well for us– High latency, rate limiting (throttling), (no) auto-

sharding, no backup problems• Time to try out one of them (other) new

fangled NoSql things…

Page 7: Cassandra from the trenches: migrating Netflix

Shiny new toy

• We selected Cassandra– Dynamo-model appealed to us – Column-based, key-value data model seemed

sufficient for most needs– Performance looked great (rudimentary tests)

• Now what?– Put something into it – Run it in EC2– Sounds easy enough…

Page 8: Cassandra from the trenches: migrating Netflix

• Data Modeling– Where the rubber meets the road

Page 9: Cassandra from the trenches: migrating Netflix

About Netflix’s AB Testing

• We use it everywhere (no, really)• Basic concepts– Test – An experiment where several competing

behaviors are implemented and compared– Cell – different experiences within a test that are

being compared against each other– Allocation – a customer-specific assignment to a

cell within a test• Customer can only be in one cell of a test at a time• Generally immutable (very important for analysis)

Page 10: Cassandra from the trenches: migrating Netflix

Data Modeling - background

• AB has two sets of data– metadata about tests – allocations

• Both need to be migrated out of Oracle and into Cassandra in the cloud

Page 11: Cassandra from the trenches: migrating Netflix

AB - allocations

• Single table to hold allocations– Currently at ~950 million records– Plus indices!

• One record for every test that every customer is allocated into

• Unique constraint on customer/test

Page 12: Cassandra from the trenches: migrating Netflix

AB - metadata

• Fairly typical parent-child table relationship• Not updated frequently, so service can cache

Page 13: Cassandra from the trenches: migrating Netflix

Data modeling in cassandra

• Every where I looked, the internets told me to understand my data use patterns– Understand the questions that you need to

answer from the data• Meaning: know how to query your data structure the

persistence model to match

• There’s no free lunch here, apparently

Page 14: Cassandra from the trenches: migrating Netflix

Identifying the AB questions that need to be answered

• get all allocations for a customer• get count of customers in test/cell• find all customers in a test/cell– So we can kick them out of the test– So we can clean up ancient data– So we can move them to a different cell in test

• find all customers allocated to test within a date range– So we can kick them out of the test

Page 15: Cassandra from the trenches: migrating Netflix

Modeling allocations in cassandra

• As we’re read-heavy, read all allocations for a customer as fast as possible– Denormalize allocations into a single row– But, how do I denormalize?

• Find all of customers in a test/cell = reverse index

• Get count of customers in test/cell = count the entries in the reverse index

Page 16: Cassandra from the trenches: migrating Netflix

Denormalization-HOWTO

• The internets talk about it, but no real world examples– ‘Normalization is for sissies’, Pat Helland

• Denormalizing allocations per customer– Trivial with a schema-less database

Page 17: Cassandra from the trenches: migrating Netflix

Denormalized allocations

• Sample normalized data

• Sample denormalized data (sparse!)

Page 18: Cassandra from the trenches: migrating Netflix

Implementing allocations

• As allocation for a customer has a handful of data points, they logically can be grouped together

• Hello, super columns• Avoided blobs, json or otherwise– data race concerns– BI integration– Serialization alg changes could tank the data

Page 19: Cassandra from the trenches: migrating Netflix

Implementing allocations, second round

• But, cassandra devs secretly despise don’t enjoy super columns

• Switched to standard column family, using composite columns

• Composite columns are sorted by each ‘token’ in name– This sorts each allocation’s data together (by

testId)

Page 20: Cassandra from the trenches: migrating Netflix

Composite columns

• Allocation column naming convention– <testId>:<field>– 42:cell = 2– 42:enabled = Y– 47:cell = 0– 47:enabled = Y

• Using terse field names, but still have column name overhead (~15 bytes)

Page 21: Cassandra from the trenches: migrating Netflix

Implementing indices

• Cassandra’s secondary indices vs. hand-built and maintained alternate indices

• Secondary indices work great on uniform data between rows

• But sparse column data not so easy

Page 22: Cassandra from the trenches: migrating Netflix

Hand-built Indices, 1

• Reverse index– Test/cell (key) to custIds (columns) • Column value is timestamp

• Mutate on allocating a customer into test

Page 23: Cassandra from the trenches: migrating Netflix

Hand-built indices, 2

• Counter column family– Test/cell to count of customers in test columns– Mutate on allocating a customer into test

• Counters are not idempotent!• Mutates need to write to every node that

hosts that key

Page 24: Cassandra from the trenches: migrating Netflix

Index rebuilding

• Yeah, even Oracle needs to have it’s indices rebuilt

• Easy enough to rebuild the reverse index, but how about that counter column?– Read the reverse index for the count and write

that as counter’s value

Page 25: Cassandra from the trenches: migrating Netflix

Modeling AB metadata in cassandra

• Explored several models, including json blobs, spreading across multiple CFs, differing degrees of denormalization

• Reverse index to identify all tests for loading

Page 26: Cassandra from the trenches: migrating Netflix

Implementing metadata

• One CF, one row for all test’s data– Every data point is a column – no blobs

• Composite columns– type:id:field• Types = base info, cells, allocation plans• Id = cell number, allocation plan (gu)id• Field = type-specific

– Base info = test name, description, enabled– Cell’s name / description– Plan’s start/end dates, country to allocate to

Page 27: Cassandra from the trenches: migrating Netflix

Into the real world … here comes the hurt

Page 28: Cassandra from the trenches: migrating Netflix

Allocation mutates

• AB allocations are immutable, so how do you prevent mutating?– Oracle – unique constraint on table– Cassandra – read before write

• Read before write in a distributed system is a data race

Page 29: Cassandra from the trenches: migrating Netflix

Running cassandra

• Compactions happen– Part of the Cassandra lifestyle– Mutations are written to memory (memtable)– Flushed to disk (sstable) on triggering threshold• Time • Size• Operations against column family

– Eventually, Cassandra decides to merge sstables as data for a individual rows becomes scattered

Page 30: Cassandra from the trenches: migrating Netflix

Compactions, 2

• Spikes happen, esp. on read-heavy systems– Everything can slow down– Sometimes, average latency > 95%ile– Throttling in newer Cass versions helps, I think– Affects clients (hector, astyanax)

Page 31: Cassandra from the trenches: migrating Netflix

Repairs

• Different from read repair!• Fix all the data in a single node by pulling

shared ranges from neighbor nodes

Page 32: Cassandra from the trenches: migrating Netflix

Repairs, 2

• Replication factor determines number of nodes involved in repair of single node

• Neighbor nodes will perform validation compaction– Pushes disk and network hard dep. on data size

• Guess what happens when you run a multi-region cluster?

Page 33: Cassandra from the trenches: migrating Netflix

Client libraries

• Round-robin is not the way to go for connection pooling– Coordinator Cassandra nodes will incorrectly be

marked down rather than target slow node• Token-aware is safer, faster, but harder to

implement

Page 34: Cassandra from the trenches: migrating Netflix

Tunings, 1

• Key and row caches– Left unbounded can chew up jvm memory needed

for normal work– Latencies will spike as the jvm needs to fight for

memory– Off-heap row cache is better but still maintains

data structures on-heap

Page 35: Cassandra from the trenches: migrating Netflix

Tunings, 2

• mmap() as in-memory cache– When process terminated, mmap pages are added

to the free list

Page 36: Cassandra from the trenches: migrating Netflix

Tunings, 3

• Sizing memtable flushes for optimizing compactions– Easier when writes are uniformly distributed,

timewise – easier to reason about flush patterns– Best to optimize flushes based on memtable size,

not time

Page 37: Cassandra from the trenches: migrating Netflix

Tunings, 4

• Sharding – Not dead yet!– If a single row has disproportionately high

gets/mutates, the nodes holding it will become hot spots

– If a row grows too large, it won’t fit into memory

Page 38: Cassandra from the trenches: migrating Netflix

Takeaways

• Netflix is making all of our components distributed and fault tolerant as we grow domestically and internationally.

• Cassandra is a core piece of our cloud infrastructure.

Page 40: Cassandra from the trenches: migrating Netflix

References

• Pat Helland, ‘Normalization Is for Sissies” http://blogs.msdn.com/b/pathelland/archive/2007/07/23/normalization-is-for-sissies.aspx

• btoddb, “Storage Sizing” http://btoddb-cass-storage.blogspot.com/