Netflix viewing data architecture evolution - EBJUG Nov 2014

90

description

Netflix's architecture for viewing data has evolved as streaming usage has grown. Each generation was designed for the next order of magnitude, and was informed by learnings from the previous. From SQL to NoSQL, from data center to cloud, from proprietary to open source, look inside to learn how this system has evolved. (slides from a talk given at the East Bay Java Users Group MeetUp in Nov 2014)

Transcript of Netflix viewing data architecture evolution - EBJUG Nov 2014

Page 1: Netflix viewing data architecture evolution - EBJUG Nov 2014
Page 2: Netflix viewing data architecture evolution - EBJUG Nov 2014

Who am I?

Philip Fisher-Ogden• Director of Engineering @

Netflix

• Playback Services (making “click play” work)

• 6 years @ Netflix, from 10 servers to 10,000s

Page 3: Netflix viewing data architecture evolution - EBJUG Nov 2014

Story

Netflix streaming – 2007 to present

Page 4: Netflix viewing data architecture evolution - EBJUG Nov 2014

Device Growth

20071 device

200810s of devices

200910s of devices

2010100s of devices

2011+1000+ devices

Page 5: Netflix viewing data architecture evolution - EBJUG Nov 2014

Experience Evolution

Page 6: Netflix viewing data architecture evolution - EBJUG Nov 2014

Subscribers & Viewing

53M global subscribers

50 countries

>2 billion hours viewed per month

Page 7: Netflix viewing data architecture evolution - EBJUG Nov 2014

Internet Traffic

Page 8: Netflix viewing data architecture evolution - EBJUG Nov 2014

Improved Personalization

Better Experience

Viewing

Virtuous Cycle

Page 9: Netflix viewing data architecture evolution - EBJUG Nov 2014

Viewing Data

Who, What, When, Where, How Long

Page 10: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real time data use cases

What have I watched?

Page 11: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real time data use cases

Where was I at?

Page 12: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real time data use cases

What else am I watching?

Page 13: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Analytics

Page 14: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Analytics

Page 15: Netflix viewing data architecture evolution - EBJUG Nov 2014

Architecture Evolution

Page 16: Netflix viewing data architecture evolution - EBJUG Nov 2014

Guiding Lights

• “Design for ~10X, but plan to rewrite before ~100X”

– Jeff Dean from Google

Page 17: Netflix viewing data architecture evolution - EBJUG Nov 2014

Guiding Lights

• "Architecture should match the problem - don't over engineer from the start; evolve as you grow”

@randyshoup

Page 18: Netflix viewing data architecture evolution - EBJUG Nov 2014

Guiding Lights

• "If you don't end up regretting your early technology decisions, you probably over-engineered”

@randyshoup

Page 19: Netflix viewing data architecture evolution - EBJUG Nov 2014

Architecture Patterns

• Service oriented

• Command Query Responsibility Segregation

• Event Sourcing

• Polyglot Persistence

Page 20: Netflix viewing data architecture evolution - EBJUG Nov 2014

Service Oriented

• Encapsulated domain

– Models, Logic, Persistence

• Service Interface

• Monolith -> Microservices

– Evolutionary Design

Page 21: Netflix viewing data architecture evolution - EBJUG Nov 2014

CQRS

• Separate Commands (updates) from Queries (reads)

• Different conceptual model for write vs. read

Page 22: Netflix viewing data architecture evolution - EBJUG Nov 2014

Event Sourcing

• Persist immutable events, not updatable state

• Replay events to determine state

• Optimize via snapshots, materialized views

Page 23: Netflix viewing data architecture evolution - EBJUG Nov 2014

Polyglot Persistence

• Different persistence technology for different use cases

• Flexibility vs. Complexity cost trade-off

Page 24: Netflix viewing data architecture evolution - EBJUG Nov 2014

Active Sessions

Last Position

Viewing History

Data Feed

Generic Architecture

Start Stop

Collect

ProcessEvent

StreamStream State

Session Summary

Provide

Page 25: Netflix viewing data architecture evolution - EBJUG Nov 2014

Conceptual Data Model

ViewRecordKey

CustomerID

Movie

Device

Start Timestamp

Vie

w 1 1

Page 26: Netflix viewing data architecture evolution - EBJUG Nov 2014

Conceptual Data Model

ActiveSession

ViewRecordKey

SessionDetails SessionDetails

Source of Play

Start Position

Latest Position

Latest Duration

Last Update Timestamp

1 1

Vie

w

1

1

Page 27: Netflix viewing data architecture evolution - EBJUG Nov 2014

Conceptual Data Model

EventLog

ViewRecordKey

EventType

EventTimestamp

EventDetails

Vie

w 1 0..*

Page 28: Netflix viewing data architecture evolution - EBJUG Nov 2014

Conceptual Data Model

EventLog

ViewRecordKey

EventType

EventTimestamp

EventDetails

Vie

w

ViewRecordKey

CustomerID

Movie

Device

Start Timestamp

EventLog

ViewRecordKey

EventType

EventTimestamp

EventDetails

EventLog

ViewRecordKey

EventType

EventTimestamp

EventDetails

EventLog

ViewRecordKey

EventType

EventTimestamp

EventDetails

SessionDetails

Source of Play

Start Position

Latest Position

Latest Duration

Last Update Timestamp

Page 29: Netflix viewing data architecture evolution - EBJUG Nov 2014

Summarize

Conceptual Data Model

EventLog

ViewRecordKey

EventType

EventTimestamp

EventDetails

Vie

wViewRecordKey

CustomerID

Movie

Device

Start Timestamp

EventLog

ViewRecordKey

EventType

EventTimestamp

EventDetails

EventLog

ViewRecordKey

EventType

EventTimestamp

EventDetails

EventLog

ViewRecordKey

EventType

EventTimestamp

EventDetails

SessionDetails

Source of Play

Start Position

Latest Position

Latest Duration

Last Update Timestamp

ViewingRecord

ViewRecordKey

Duration

Position

Last Modified Timestamp

Position

CustomerID

Movie

Latest Position

Page 30: Netflix viewing data architecture evolution - EBJUG Nov 2014

Conceptual Data Model

ViewingRecord

ViewRecordKey

Duration

Position

Last Modified Timestamp

Vie

win

g H

isto

ry

ViewingRecord

ViewRecordKey

Duration

Position

Last Modified Timestamp

ViewingRecord

ViewRecordKey

Duration

Position

Last Modified Timestamp

ViewingRecord

ViewRecordKey

Duration

Position

Last Modified Timestamp

ViewingRecord

ViewRecordKey

Duration

Position

Last Modified Timestamp

ViewingRecord

ViewRecordKey

Duration

Position

Last Modified Timestamp

ViewingRecord

ViewRecordKey

Duration

Position

Last Modified Timestamp

ViewingRecord

ViewRecordKey

Duration

Position

Last Modified Timestamp

Late

st P

osi

tio

ns

Position

CustomerID

Movie

Latest Position

Position

CustomerID

Movie

Latest Position

Position

CustomerID

Movie

Latest Position

Position

CustomerID

Movie

Latest Position

Position

CustomerID

Movie

Latest Position

Position

CustomerID

Movie

Latest Position

Position

CustomerID

Movie

Latest Position

CustomerID CustomerID

Page 31: Netflix viewing data architecture evolution - EBJUG Nov 2014

Command Use CasesAction Operation Key DataSet

Start Insert ViewRecordKey ActiveSessionViewingRecord

Continue (heartbeat)

Update ViewRecordKey ActiveSession

Log Insert ViewRecordKey EventLog

Stop Update ViewRecordKey ActiveSessionViewingRecord

Snapshot Insert/Update CustomerID ViewingHistory

Positions

Page 32: Netflix viewing data architecture evolution - EBJUG Nov 2014

Query Use CasesQuery Operation Key DataSet

Currently watching? Select/Read ViewRecordKey ActiveSession

Current position? Select/Read ViewRecordKey ActiveSession

CustomerID Positions

All positions? Select/Read CustomerID Positions

All history? Select/Read CustomerID ViewingHistory

Page 33: Netflix viewing data architecture evolution - EBJUG Nov 2014

Architecture Evolution

• Different generations

• Pain points & learnings

• Re-architecture motivations

Page 34: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 35: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 1

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 36: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 1

Start Stop

SessionsLogs / Events

History / Position

SQL

Page 37: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 1 pain points

• Scalability

– DB scaled up not out

• Event Data Analytics

– ad hoc

• Fixed schema

Page 38: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 2

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 39: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 2 motivations

• Scalability

– Scale out not up

• Flexible schema

– Key/value attributes

• Service oriented

Page 40: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 2

Start Stop

No

SQL

50 data partitions

Viewing Service

Page 41: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 2 pain points

• Scale out

– Resharding was painful

• Performance

– Hot spots

• Disaster Recovery

– SimpleDB had no backups

Page 42: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3

2007 2009 20102008 2011 2012 2013 2014 Future

SQL

No

SQL

Cac

hin

g

redismemcached

Page 43: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 landscape

• Cassandra 0.6

• Before SSDs in AWS

• Netflix in 1 AWS region

Page 44: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 motivations

• Order of magnitude increase in requests

• Scalability

– Actually scale out rather than up

Page 45: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

StatelessTier

(fallback)

Sessions

Viewing History

Mem

cach

ed

Page 46: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Start

Stop

Page 47: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

Page 48: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

update

Page 49: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

snapshot

Sessions

Page 50: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

Viewing History

Mem

cach

ed

Page 51: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

Start

Stop

Viewing History

Mem

cach

ed

Sessions

update

Page 52: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 writes

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

StatelessTier

(fallback)

Sessions

Viewing History

Mem

cach

ed

Page 53: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 reads

Vie

win

g Se

rvic

e

StatelessTier

What have I

watched?

Viewing History

Mem

cach

ed

Page 54: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 reads

Vie

win

g Se

rvic

e

StatefulTier

Latest PositionsWhere

was I at?

Viewing History

StatelessTier

(fallback)

Mem

cach

ed

Page 55: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 reads

Vie

win

g Se

rvic

e

StatefulTier

What else am I

watching?

Active Sessions

Page 56: Netflix viewing data architecture evolution - EBJUG Nov 2014

Architecture Patterns - Discuss

• Service oriented

• Command Query Responsibility Segregation

• Event Sourcing

• Polyglot Persistence

Page 57: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

StatelessTier

(fallback)

Sessions

Viewing History

Mem

cach

ed

Page 58: Netflix viewing data architecture evolution - EBJUG Nov 2014

gen 3 - Requests ScaleOperation Scale

Create (start streaming) 1,000s per second

Update (heartbeat, close) 100,000s per second

Append (session events/logs) 10,000s per second

Read viewing history 10,000s per second

Read latest position 100,000s per second

Page 59: Netflix viewing data architecture evolution - EBJUG Nov 2014

gen 3 – Cluster ScaleCluster Scale

Cassandra Viewing History ~100 hi1.4xl nodes~48 TB total space used

Viewing Service Stateful Tier ~1700 r3.2xl nodes50GB heap memory per node

Memcached ~450 r3.2xl/xl nodes~8TB memory used

Page 60: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 pain points

• Stateful tier

– Hot spots

– Multi-region complexity

• Monolithic service

• read-modify-write poorly suited for memcached

Page 61: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 3 learnings

• Distributed stateful systems are hard

– Go stateless, use C*/memcached/redis…

• Decompose into microservices

Page 62: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 4

Vie

win

g Se

rvic

e

StatefulTier

0

1

n-2

n-1

Active Sessions

Latest Positions

View Summary

StatelessTier

(fallback)

Viewing History

Sessions

Mem

cach

ed

Page 63: Netflix viewing data architecture evolution - EBJUG Nov 2014

Real Time Data – gen 4

(Work in progress)

Page 64: Netflix viewing data architecture evolution - EBJUG Nov 2014

Microservices:Components as Services

Collector

Processor

Provider

Events

Queries

Page 65: Netflix viewing data architecture evolution - EBJUG Nov 2014

Microservices:Decoupled Communication

Collector Processor Provider

Events Materialized Views

Signals

Page 66: Netflix viewing data architecture evolution - EBJUG Nov 2014

Request Processing Design

• Dimensions:

– Response required?

– Latency target?

– Where?

• In process

• Remote process

Page 67: Netflix viewing data architecture evolution - EBJUG Nov 2014

Request Processing Design

Low-latencytasks

Medium-latencyasync tasks

High-latencyasync tasks

Response Required

Latency Low

Where In-process

Page 68: Netflix viewing data architecture evolution - EBJUG Nov 2014

Request Processing Design

Low-latencytasks

Medium-latencyasync tasks

High-latencyasync tasks

Response Not required

Latency Medium

Where In-process

Page 69: Netflix viewing data architecture evolution - EBJUG Nov 2014

Request Processing Design

Low-latencytasks

Medium-latencyasync tasks

High-latencyasync tasks

Response Not required

Latency High

Where Remoteprocess

Page 70: Netflix viewing data architecture evolution - EBJUG Nov 2014

Start Streaming Example

Page 71: Netflix viewing data architecture evolution - EBJUG Nov 2014

Start Streaming

Start

Stop

Low-latencytasks

Medium-latencyasync tasks

Viewing History

Sessions

High-latencyasync tasks

Page 72: Netflix viewing data architecture evolution - EBJUG Nov 2014

Start Streaming

Start

Stop

Low-latencytasks

Viewing History

Sessions

Page 73: Netflix viewing data architecture evolution - EBJUG Nov 2014

Start Streaming

Viewing History

Sessions

Check Active Sessions within Account Limits

Page 74: Netflix viewing data architecture evolution - EBJUG Nov 2014

Start Streaming

Viewing History

Sessions

Persist session

Page 75: Netflix viewing data architecture evolution - EBJUG Nov 2014

Start Streaming

Viewing History

SessionsEnqueueSave to Viewing

History

Page 76: Netflix viewing data architecture evolution - EBJUG Nov 2014

Start Streaming

Viewing History

Sessions

Within limit,respond OK.

Save to Viewing History

Asynchronous

Page 77: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Interactions

Start

Stop

Collectors* | Processors*

Viewing History

Session Events

Positions

Page 78: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Summary Example

• End playback

• Summarize session

Page 79: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Summary

Start

Stop

Low-latencyblocking tasks

Medium-latencyasync tasks

Session Summarizer

High-latencyasync tasks

Collector

Processor

Page 80: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Summary

Start

StopHigh-latencyasync tasks

Collector

Processor

Session Summarizer

Page 81: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Summary

Start

StopHigh-latencyasync tasks

Processor

Collector

Session Summarizer

Page 82: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Summary

Session Events

Session Summarizer

Retrieve by Session Key

Page 83: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Summary

Session Events

Session Summarizer

Retrieve by Session Key

Page 84: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Summary

Session Events

Session Summarizer

Order

Page 85: Netflix viewing data architecture evolution - EBJUG Nov 2014

Session Summary

Session Events

Session Summarizer

Summarize

Viewing History

Positions

Page 86: Netflix viewing data architecture evolution - EBJUG Nov 2014

Architecture Patterns - Discuss

• Service oriented

• Command Query Responsibility Segregation

• Event Sourcing

• Polyglot Persistence

Page 87: Netflix viewing data architecture evolution - EBJUG Nov 2014

Takeaways

• Architectural Patterns

• Evolutionary Design

– Evolve as you grow

• Re-architect for order of magnitude shifts

Page 88: Netflix viewing data architecture evolution - EBJUG Nov 2014

Questions?

@philip_pfo

Page 89: Netflix viewing data architecture evolution - EBJUG Nov 2014

Feedback?

@philip_pfo

Page 90: Netflix viewing data architecture evolution - EBJUG Nov 2014

Thanks!

@philip_pfo