Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) |...

54
Netflix Recommendations using Spark + Cassandra Prasanna Padmanabhan Roopa Tangirala

Transcript of Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) |...

Page 1: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Netflix Recommendations using Spark + Cassandra

Prasanna PadmanabhanRoopa Tangirala

Desktop
Page 2: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Turn on Netflix and the absolute best content for you would automatically start playing

Desktop
Page 3: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Netflix Recommendations

Desktop
Page 4: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Netflix Recommendations

Desktop
Page 5: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Ranking

Everything is a RecommendationRo

ws

Over 80% of what members watch comes from our recommendations

Recommendations are driven by Machine Learning Algorithms

Desktop
Page 6: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Data Driven

Offline Experiment using Historical

Data

Online A/B Testing

Rollout Feature to ALL members

Success Success

Fail

Algorithmic Page Generation

Trending Now

Desktop
Page 7: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Offline Experimentation

Desktop
Page 8: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Algorithmic Page Generation

Personalizing the ordering of rows on the homepage

Desktop
Page 9: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Algorithmic Page Generation

Without Algorithmic Page Generation With Algorithmic Page Generation

Diversity of the Page

Affinity for specific rows

Drawbacks

Desktop
Page 10: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Algorithmic Page Generation

Production

Desktop
Page 11: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Algorithmic Page Generation

Production Variant 1

Desktop
Page 12: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Algorithmic Page Generation

Production Variant 1 Variant 2

Row DistributionTV/Movie Ratio

Desktop
Page 13: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Algorithmic Page Generation

Production Variant 1 Variant 2

Evaluate best variant based on the plays

Actual Plays:

Desktop
Page 14: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Algorithmic Page Generation

Production Variant 1 Variant 2

Evaluate best variant based on the plays

Actual Plays:

Desktop
Page 15: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Algorithmic Page Generation

Production Variant 1 Variant 2

Evaluate best variant based on the plays

Actual Plays:

Desktop
Page 16: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Variant 2

Algorithmic Page Generation

Production Variant 1

Evaluate best variant based on the plays

Actual Plays:

Desktop
Page 17: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Offline Experiment ArchitectureMemberSelection

Runs once a day

Ratings Service

S3

Snapshot Snapshot Store

Snapshot Forklift

Viewing History Service

MyList Service

Data Snapshots

Evaluate Metrics

Generate Pages

… …

A/B Test

Desktop
Page 18: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Data Model - Requirements

• Need for historical service data

• Optimize for Batch Writes and Point Reads

Desktop
Page 19: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Data Model

20161009_1001

20161009_1002

DATE_MEMBER_ID

MyList

BLOB

MyList

BLOB

ROWS

COLUMN

COLUMN FAMILY: MYLIST

Desktop
Page 20: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Data Model

20161009_1001

20161009_1002

DATE_MEMBER_ID

ViewingData

BLOB

ViewingData

BLOB

ROWS

COLUMN

COLUMN FAMILY: VIEWING-HISTORY

Desktop
Page 21: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Data Model

20161009_1001_0

20161009_1001_1

DATE_MEMBERID_IDX

ViewingData

BLOB

ViewingData

BLOB

ROWS

COLUMN

20161009_1001_2ViewingData

BLOB

COLUMN FAMILY: VIEWING-HISTORY

Desktop
Page 22: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Online A/B Testing

Desktop
Page 23: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Trending Now

Videos that are Trending and Personalized for you

Desktop
Page 24: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Trending Now

It’s 7 PM on a Monday

Desktop
Page 25: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Trending Now

It’s 10 PM on a Saturday

Desktop
Page 26: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Trending Now

Pokeman

Desktop
Page 27: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Fast Feedback LoopUI

Data Systems

Streaming Apps

Rec Systems

Desktop
Page 28: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Trending Now - Data InfrastructureImpression

Service

Viewing History Service

UI

Online Services

Trends Store

Compute Trends

Model Training

Captures videos shown in view port

Captures videos played by members

Publish Models

Viewing History Service

Ratings. .. .

Desktop
Page 29: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

State Management in Cassandra

Video Number of Plays

Stranger Things 100

Narcos 200

Orange is the new Black 300

Desktop
Page 30: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

State Management in Cassandra

Trends Store

State Present

?Compute Trends

Yes

NoInit State from

Cassandra

Load State

Update State

Read Events

Desktop
Page 31: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Data Model - Requirements

• Trending data is for a specific interval of time

• Optimize for Batch Writes and Batch Reads

Desktop
Page 32: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Data Model

101_METADATA

102_METADATA

VIDEOID_METADATA

Plays

BLOB

Plays

BLOB

ROWS

COLUMNS

103_METADATAPlays

BLOB

COLUMN FAMILY: Interval 1,Interval 2…Interval N

Impressions

BLOB

Impressions

BLOB

Impressions

BLOB

Desktop
Page 33: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Roopa TangiralaEngineering Manager @ NetflixTwitter - @roopatangirala

Page 34: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

FORKLIFTER

Page 35: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

ARCHITECTURE

SOURCE TARGET

Page 36: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

USE CASES

Page 37: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016
Page 38: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

APACHE THRIFT CQL

Page 39: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016
Page 40: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

DEMO

Page 41: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016
Page 42: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

WHY NOT DSE SPARK?

Page 43: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016
Page 44: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

SCALABILITY

Page 45: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

COST EFFECTIVENESS

Page 46: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

LESSONS LEARNT

Page 47: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

TTL HANDLING• TTL Reading And Writing is Asymmetric -

CASSANDRA 12216 • Thrift Column TTL vs CQL Row TTL

Page 48: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

1

6

5

4

3

2

PARTITION DIFFERENCES

1000

00

600000

500000

4000

00

300000

200000100k

75k

50k

25k

425k450k475k

400k

325k

350k375k

300k

275k

250k

225k

200k175k150k125k

500k

525k

550k575k

600k

Page 49: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

TUNING• spark.cassandra.connection.keep_aliv

e_ms• spark.cassandra.connection.timeout_

ms• spark.driver.maxResultSize

Page 50: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

OOM EXCEPTIONS Spark.executor.memory

spark.cassandra.input.split.size_in_mb

Page 51: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

WRITES SPEED SPARK• cassandra.output.batch.size.bytes• cassandra.output.batch.size.rows• cassandra.output.concurrent.writes• cassandra.output.throughput_mb_per_s

ec

Page 52: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

Write Timeoutscassandra.output.throughput_mb_per_sec

Page 53: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016
Page 54: Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa Tangirala, Netflix) | Cassandra Summit 2016

QUESTIONS?