Netflix viewing data architecture evolution - EBJUG Nov 2014
-
Upload
philip-fisher-ogden -
Category
Technology
-
view
1.719 -
download
1
description
Transcript of Netflix viewing data architecture evolution - EBJUG Nov 2014
Who am I?
Philip Fisher-Ogden• Director of Engineering @
Netflix
• Playback Services (making “click play” work)
• 6 years @ Netflix, from 10 servers to 10,000s
Story
Netflix streaming – 2007 to present
Device Growth
20071 device
200810s of devices
200910s of devices
2010100s of devices
2011+1000+ devices
Experience Evolution
Subscribers & Viewing
53M global subscribers
50 countries
>2 billion hours viewed per month
Internet Traffic
Improved Personalization
Better Experience
Viewing
Virtuous Cycle
Viewing Data
Who, What, When, Where, How Long
Real time data use cases
What have I watched?
Real time data use cases
Where was I at?
Real time data use cases
What else am I watching?
Session Analytics
Session Analytics
Architecture Evolution
Guiding Lights
• “Design for ~10X, but plan to rewrite before ~100X”
– Jeff Dean from Google
Guiding Lights
• "Architecture should match the problem - don't over engineer from the start; evolve as you grow”
@randyshoup
Guiding Lights
• "If you don't end up regretting your early technology decisions, you probably over-engineered”
@randyshoup
Architecture Patterns
• Service oriented
• Command Query Responsibility Segregation
• Event Sourcing
• Polyglot Persistence
Service Oriented
• Encapsulated domain
– Models, Logic, Persistence
• Service Interface
• Monolith -> Microservices
– Evolutionary Design
CQRS
• Separate Commands (updates) from Queries (reads)
• Different conceptual model for write vs. read
Event Sourcing
• Persist immutable events, not updatable state
• Replay events to determine state
• Optimize via snapshots, materialized views
Polyglot Persistence
• Different persistence technology for different use cases
• Flexibility vs. Complexity cost trade-off
Active Sessions
Last Position
Viewing History
Data Feed
Generic Architecture
Start Stop
Collect
ProcessEvent
StreamStream State
Session Summary
Provide
Conceptual Data Model
ViewRecordKey
CustomerID
Movie
Device
Start Timestamp
Vie
w 1 1
Conceptual Data Model
ActiveSession
ViewRecordKey
SessionDetails SessionDetails
Source of Play
Start Position
Latest Position
Latest Duration
Last Update Timestamp
…
1 1
Vie
w
1
1
Conceptual Data Model
EventLog
ViewRecordKey
EventType
EventTimestamp
EventDetails
Vie
w 1 0..*
Conceptual Data Model
EventLog
ViewRecordKey
EventType
EventTimestamp
EventDetails
Vie
w
ViewRecordKey
CustomerID
Movie
Device
Start Timestamp
EventLog
ViewRecordKey
EventType
EventTimestamp
EventDetails
EventLog
ViewRecordKey
EventType
EventTimestamp
EventDetails
EventLog
ViewRecordKey
EventType
EventTimestamp
EventDetails
SessionDetails
Source of Play
Start Position
Latest Position
Latest Duration
Last Update Timestamp
…
Summarize
Conceptual Data Model
EventLog
ViewRecordKey
EventType
EventTimestamp
EventDetails
Vie
wViewRecordKey
CustomerID
Movie
Device
Start Timestamp
EventLog
ViewRecordKey
EventType
EventTimestamp
EventDetails
EventLog
ViewRecordKey
EventType
EventTimestamp
EventDetails
EventLog
ViewRecordKey
EventType
EventTimestamp
EventDetails
SessionDetails
Source of Play
Start Position
Latest Position
Latest Duration
Last Update Timestamp
…
ViewingRecord
ViewRecordKey
Duration
Position
Last Modified Timestamp
Position
CustomerID
Movie
Latest Position
Conceptual Data Model
ViewingRecord
ViewRecordKey
Duration
Position
Last Modified Timestamp
Vie
win
g H
isto
ry
ViewingRecord
ViewRecordKey
Duration
Position
Last Modified Timestamp
ViewingRecord
ViewRecordKey
Duration
Position
Last Modified Timestamp
ViewingRecord
ViewRecordKey
Duration
Position
Last Modified Timestamp
ViewingRecord
ViewRecordKey
Duration
Position
Last Modified Timestamp
ViewingRecord
ViewRecordKey
Duration
Position
Last Modified Timestamp
ViewingRecord
ViewRecordKey
Duration
Position
Last Modified Timestamp
ViewingRecord
ViewRecordKey
Duration
Position
Last Modified Timestamp
Late
st P
osi
tio
ns
Position
CustomerID
Movie
Latest Position
Position
CustomerID
Movie
Latest Position
Position
CustomerID
Movie
Latest Position
Position
CustomerID
Movie
Latest Position
Position
CustomerID
Movie
Latest Position
Position
CustomerID
Movie
Latest Position
Position
CustomerID
Movie
Latest Position
CustomerID CustomerID
Command Use CasesAction Operation Key DataSet
Start Insert ViewRecordKey ActiveSessionViewingRecord
Continue (heartbeat)
Update ViewRecordKey ActiveSession
Log Insert ViewRecordKey EventLog
Stop Update ViewRecordKey ActiveSessionViewingRecord
Snapshot Insert/Update CustomerID ViewingHistory
Positions
Query Use CasesQuery Operation Key DataSet
Currently watching? Select/Read ViewRecordKey ActiveSession
Current position? Select/Read ViewRecordKey ActiveSession
CustomerID Positions
All positions? Select/Read CustomerID Positions
All history? Select/Read CustomerID ViewingHistory
Architecture Evolution
• Different generations
• Pain points & learnings
• Re-architecture motivations
Real Time Data
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
No
SQL
Cac
hin
g
redismemcached
Real Time Data – gen 1
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
No
SQL
Cac
hin
g
redismemcached
Real Time Data – gen 1
Start Stop
SessionsLogs / Events
History / Position
SQL
Real Time Data – gen 1 pain points
• Scalability
– DB scaled up not out
• Event Data Analytics
– ad hoc
• Fixed schema
Real Time Data – gen 2
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
No
SQL
Cac
hin
g
redismemcached
Real Time Data – gen 2 motivations
• Scalability
– Scale out not up
• Flexible schema
– Key/value attributes
• Service oriented
Real Time Data – gen 2
Start Stop
No
SQL
50 data partitions
Viewing Service
Real Time Data – gen 2 pain points
• Scale out
– Resharding was painful
• Performance
– Hot spots
• Disaster Recovery
– SimpleDB had no backups
Real Time Data – gen 3
2007 2009 20102008 2011 2012 2013 2014 Future
SQL
No
SQL
Cac
hin
g
redismemcached
Real Time Data – gen 3 landscape
• Cassandra 0.6
• Before SSDs in AWS
• Netflix in 1 AWS region
Real Time Data – gen 3 motivations
• Order of magnitude increase in requests
• Scalability
– Actually scale out rather than up
Real Time Data – gen 3
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
StatelessTier
(fallback)
Sessions
Viewing History
Mem
cach
ed
Real Time Data – gen 3 writes
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
Start
Stop
Real Time Data – gen 3 writes
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Real Time Data – gen 3 writes
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
update
Real Time Data – gen 3 writes
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
snapshot
Sessions
Real Time Data – gen 3 writes
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Viewing History
Mem
cach
ed
Real Time Data – gen 3 writes
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
Start
Stop
Viewing History
Mem
cach
ed
Sessions
update
Real Time Data – gen 3 writes
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
StatelessTier
(fallback)
Sessions
Viewing History
Mem
cach
ed
Real Time Data – gen 3 reads
Vie
win
g Se
rvic
e
StatelessTier
What have I
watched?
Viewing History
Mem
cach
ed
Real Time Data – gen 3 reads
Vie
win
g Se
rvic
e
StatefulTier
Latest PositionsWhere
was I at?
Viewing History
StatelessTier
(fallback)
Mem
cach
ed
Real Time Data – gen 3 reads
Vie
win
g Se
rvic
e
StatefulTier
What else am I
watching?
Active Sessions
Architecture Patterns - Discuss
• Service oriented
• Command Query Responsibility Segregation
• Event Sourcing
• Polyglot Persistence
Real Time Data – gen 3
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
StatelessTier
(fallback)
Sessions
Viewing History
Mem
cach
ed
gen 3 - Requests ScaleOperation Scale
Create (start streaming) 1,000s per second
Update (heartbeat, close) 100,000s per second
Append (session events/logs) 10,000s per second
Read viewing history 10,000s per second
Read latest position 100,000s per second
gen 3 – Cluster ScaleCluster Scale
Cassandra Viewing History ~100 hi1.4xl nodes~48 TB total space used
Viewing Service Stateful Tier ~1700 r3.2xl nodes50GB heap memory per node
Memcached ~450 r3.2xl/xl nodes~8TB memory used
Real Time Data – gen 3 pain points
• Stateful tier
– Hot spots
– Multi-region complexity
• Monolithic service
• read-modify-write poorly suited for memcached
Real Time Data – gen 3 learnings
• Distributed stateful systems are hard
– Go stateless, use C*/memcached/redis…
• Decompose into microservices
Real Time Data – gen 4
Vie
win
g Se
rvic
e
StatefulTier
0
1
n-2
n-1
…
Active Sessions
Latest Positions
View Summary
StatelessTier
(fallback)
Viewing History
Sessions
Mem
cach
ed
Real Time Data – gen 4
(Work in progress)
Microservices:Components as Services
Collector
Processor
Provider
Events
Queries
Microservices:Decoupled Communication
Collector Processor Provider
Events Materialized Views
Signals
Request Processing Design
• Dimensions:
– Response required?
– Latency target?
– Where?
• In process
• Remote process
Request Processing Design
Low-latencytasks
Medium-latencyasync tasks
High-latencyasync tasks
Response Required
Latency Low
Where In-process
Request Processing Design
Low-latencytasks
Medium-latencyasync tasks
High-latencyasync tasks
Response Not required
Latency Medium
Where In-process
Request Processing Design
Low-latencytasks
Medium-latencyasync tasks
High-latencyasync tasks
Response Not required
Latency High
Where Remoteprocess
Start Streaming Example
Start Streaming
Start
Stop
Low-latencytasks
Medium-latencyasync tasks
Viewing History
Sessions
High-latencyasync tasks
Start Streaming
Start
Stop
Low-latencytasks
Viewing History
Sessions
Start Streaming
Viewing History
Sessions
Check Active Sessions within Account Limits
Start Streaming
Viewing History
Sessions
Persist session
Start Streaming
Viewing History
SessionsEnqueueSave to Viewing
History
Start Streaming
Viewing History
Sessions
Within limit,respond OK.
Save to Viewing History
Asynchronous
Session Interactions
Start
Stop
Collectors* | Processors*
Viewing History
Session Events
Positions
Session Summary Example
• End playback
• Summarize session
Session Summary
Start
Stop
Low-latencyblocking tasks
Medium-latencyasync tasks
Session Summarizer
High-latencyasync tasks
Collector
Processor
Session Summary
Start
StopHigh-latencyasync tasks
Collector
Processor
Session Summarizer
Session Summary
Start
StopHigh-latencyasync tasks
Processor
Collector
Session Summarizer
Session Summary
Session Events
Session Summarizer
Retrieve by Session Key
Session Summary
Session Events
Session Summarizer
Retrieve by Session Key
Session Summary
Session Events
Session Summarizer
Order
Session Summary
Session Events
Session Summarizer
Summarize
Viewing History
Positions
Architecture Patterns - Discuss
• Service oriented
• Command Query Responsibility Segregation
• Event Sourcing
• Polyglot Persistence
Takeaways
• Architectural Patterns
• Evolutionary Design
– Evolve as you grow
• Re-architect for order of magnitude shifts
Questions?
@philip_pfo
Feedback?
@philip_pfo
Thanks!
@philip_pfo