Download - Cassandra Day SV 2014: Scaling Hulu’s Video Progress Tracking Service with Apache Cassandra

CASSANDRA DAY SILICON VALLEY 2014 – APRIL 7TH – MATT JURIK

SCALING VIDEO PROGRESS TRACKING

MATT JURIK SOFTWARE DEVELOPER

WHAT IS HULU?

3

Help people find and enjoy the world’s premium content

when, where and how they want it.

HULU’S MISSION

5

•  Service Oriented Architecture

•  Follow the Unix Philosophy

•  Small services with specialized scopes

•  Small teams focusing on specific areas

•  Right tool for the job

•  Many languages, frameworks, formats

•  Cross team development encouraged

•  If something you depend on needs fixing, feel free to fix it

VIDEO PROGRESS TRACKING CODENAME: HUGETOP

6

AGENDA

•  Old architecture

•  New architecture

•  Keyspace design

•  Migrating to cassandra

•  Operations

9

OLD ARCHITECTURE (MYSQL)

HUGETOP (PYTHON)

OTHER SERVICES DEVICES HULU.COM

64 Redis Shards (Persistence-enabled)

API (PYTHON)

8 MySQL Shards

10

NEW ARCHITECTURE (C*)

HUGETOP (PYTHON)

OTHER SERVICES DEVICES HULU.COM

64 Redis Shards (Cache-only)

CRAPI (JAVA)

8 Cassandra Nodes

The dilemma •  Unbounded data growth •  MySQL very stable, but servers running out of space •  “Manually resharding is fun!” – No one, ever

Why cassandra? •  Our data fits cassandra’s data model well. •  Cassandra promises (and delivers) great scalability •  Highly available •  Multi-DC

11

WHY SWITCH?

12

INTERACTION BETWEEN REDIS + CASSANDRA

HUGETOP

64 Redis Shards (Cache-only)

CRAPI

8 Cassandra Nodes

Video position updates 1.  Write position info to cassandra 2.  Update Redis

Video position requests Check redis: If data is loaded in redis, return it. Else: Fetch user’s history from cassandra, Queue job to update redis, Return data fetched from cassandra.

Redis •  Maintains complex indices •  Enrich data by simulating joins with Lua Cassandra •  Provides durability •  Replenish Redis as necessary

Take one •  Hadoop-class machines

•  Physical boxes (i.e., no VMs) •  6 standard 7200rpm drives •  32gb RAM •  Leveled compaction + JBOD •  Write throughput J •  Read latency L

13

HARDWARE CONSIDERATIONS

Take two •  SSD-based machines

•  Physical boxes (c-states disabled) •  550gb RAID5 •  48gb RAM •  Leveled compaction •  Write throughput J •  Read latency J

•  16 nodes split between 2 DCs

14

•  Query last position for user=X, video=Y •  Query last position for user=X, video=*

•  Daily log of all views needed by other services •  Two tables: one for updates; one for deletes. •  Shard data across rows •  TTL’d

KEYSPACE DESIGN Copy 1

CREATE TABLE views ( u int, # User ID v int, # Video ID c boolean, # Is completed? p float, # Video position t timestamp, # Last viewed at ..., # Other fields PRIMARY KEY (u, v) );

CREATE TABLE daily_user_views ( s int, # Partition key u int, # User ID v int, # Video ID ..., # Other fields PRIMARY KEY (s, u, v) );

Copy 2

•  Single row containing one day’s worth of data = too BIG + causes hotspots •  Fetching single row in parallel is slow

•  Solution: shard each day across 128 rows => Spreads data across multiple nodes => Query multiple nodes in parallel

15

SHARDING!?

Partition key userID % 128 + daysBetween(EPOCH, viewDate) * 128

April 7th, 2014 (daysBetween(EPOCH, “April 7th, 2014”) = 16167): for(int i = 0; i < 128; i++) { int k = i % 128 + 16167 * 128 execute(“SELECT * FROM daily_user_views WHERE s = “ + k) }

16

MIGRATING FROM MYSQL ! CASSANDRA

HUGETOP

1 Read/write to MySQL

MySQL Cassandra

2 Duplicate writes+deletes to Cassandra - column timestamps = last_played_at date ß Critical for next step - apply deletions, but also temporarily store them in deletion_ledger

17

MIGRATING FROM MYSQL ! CASSANDRA

HUGETOP

Backfill old data Again, write to Cassandra with column timestamp = last_viewed_at date (prevents old position from overwriting new position)

MySQL Cassandra

3

Replay deletions stored in deletion_ledger Just like inserts, you can specify a timestamp for deletions. column timestamp = time at which original deletion occurred (prevents deleting new data)

4

•  Use internal tool for automating repairs, backups, etc. •  Metrics

•  Dump metrics to graphite via custom -javaagent which hooks into yammer metrics •  Implement a MetricPredicate to filter boring metrics

•  High level monitoring (something is usually wrong if): •  d(hint count)/dt > 0 •  Large number of old gen collections •  Lots of SSTables in L0 (and not importing data, bootstrapping, etc)

18

OPERATIONS

•  SSTable Corruption •  nodetool scrub •  sstablescrub – if things are really bad

•  Things to watch: •  Snapshots awesome, but can quickly burn disk space •  Keep nodes under 50% disk utilization, even if using Leveled Compaction.

19

OPERATIONS

THANK YOU

QUESTIONS?