NOW TV and Linear Streaming: The unpredictable scalability challenge - Devoxx UK 2015

62
NOW TV and Linear Streaming: The unpredictable scalability challenge Tom Maule – NOW TV Solution Architect

Transcript of NOW TV and Linear Streaming: The unpredictable scalability challenge - Devoxx UK 2015

NOW TV and Linear Streaming: The unpredictable scalability challenge

Tom Maule – NOW TV Solution Architect

2

•  Tom Maule–  Solution Architect at NOW TV, Sky

–  Previously Senior Java Developer on NOW TV Platform team (since project inception in early 2012)

I have also previously worked in the defence and telecoms industries

[email protected]

linkedin.com/in/tommaule

@tommaule

Who am I?

3

Abstract

•  NOW TV Introduction

•  Linear streaming challenges

•  7th April 2014

•  Fixes and improvements

•  13th April 2015

•  Future work and next steps

4

Introduction - Overview•  NOW TV is the online no-contract TV streaming service from Sky

•  Available on over 60 devices including the award-winning NOW TV Box

•  NOW TV offers movies and entertainment VOD and linear content, and for the first time in the UK, pay-as-you-go Sports linear content

5

Introduction - Customer Base•  Our customer base TRIPLED in the year up to April 2014

2013 2014

6

Introduction – Streaming Features

VOD Streaming

VOD DRM

Linear Streaming

Linear DRM

Concurrency Limits

7

Introduction - NOW TV Architecture

CDN

Content

Content Metadata

Account Data

VOD Transcoding

Linear Transcoding

CDN

Manifest and video chunks

Live video stream

Stream upload

Asset upload

Content metadata, User services

User device

Video Assets

NOW TV Platform

Load Balancer Load Balancer

Services

Logs

Splunk

MMS

Icinga

Monitoring & alerting:

New Relic

8

Video On Demand (VOD)•  Video content, available on demand, whenever users want it.

•  Platform load is predictable, just ask any of Netflix, Amazon Instant Video, YouTube, etc

9

Video On Demand (VOD)•  Even weekend load, though busier during the day, remains predictable

10

Linear Streaming•  Unlike other OTT (Over-the-Top) Providers, NOW TV offers streaming of live channels

•  This is typically NOT predictable

•  Load is driven by live events, not by time of day

Linear VOD

11

NOW TV and Linear Streaming: The unpredictable scalability challengeTom Maule – NOW TV Solution

Architect

12

13

14

15

16

Commemorative Merchandise

17

18

Why did we not see this coming?

19

Why did we not see this coming?

20

What happened?

•  High load stressed our database

•  Retries only compounded the problem

•  Observed issues:–  Customers couldn’t start new streams

–  Existing streams were terminated

–  Concurrency errors during and shortly after the outage

–  Very high read and write queues in Mongo DB

–  Entitlement and Viewing History APIs performed very slowly

–  High proportion of time was spent updating indexes in Mongo DB

21

Issues to Address

•  Heartbeating resiliency

•  Concurrency inaccuracies

•  Entitlement checking

•  Products storage

•  Viewing History

•  Indexes in Mongo DB

•  Mongo DB write lock

H

C

E

P

V

I

M

H C E P V I M

22

Heartbeating: Introduction•  After playout initiation, actual video chunks are served by CDN, and don't touch our platform

•  Lightweight heartbeats call back to our platform to notify us of continued playout every 10 mins

•  NOW TV use heartbeats to:–  Enforce concurrency rules

–  Enforce entitlement

–  Record bookmark positions (VOD only)

CDN

NOW TV Video chunks

Heartbeats(10 min interval)

H C E P V I M

23

Heartbeating: Previously•  Previously, a non-OK heartbeat response would terminate playout on the user’s device

•  Fail in favour of NOW TV–  When NOW TV platform is unavailable, existing playouts are terminated on next heartbeat.

CDN

NOW TV Video chunks

H C E P V I M

24

Heartbeating: Previously•  Previously, a non-OK heartbeat response would terminate playout on the user’s device

•  Fail in favour of NOW TV–  When NOW TV platform is unavailable, existing playouts are terminated on next heartbeat.

CDN

NOW TV Video chunks

Heartbeat

non-OK response

H C E P V I M

25

Heartbeating: Previously•  Previously, a non-OK heartbeat response would terminate playout on the user’s device

•  Fail in favour of NOW TV–  When NOW TV platform is unavailable, existing playouts are terminated on next heartbeat.

CDN

NOW TV Heartbeat

non-OK response

H C E P V I M

26

Heartbeating: Today•  Today, playout continues unless a specific STOP heartbeat response is received

•  Fail in favour of the customer–  Existing streams will NOT be terminated if NOW TV becomes unavailable

CDN

NOW TV Video chunks

H C E P V I M

27

Heartbeating: Today•  Today, playout continues unless a specific STOP heartbeat response is received

•  Fail in favour of the customer–  Existing streams will NOT be terminated if NOW TV becomes unavailable

CDN

NOW TV Video chunks

Heartbeat

non-STOP response

H C E P V I M

28

Heartbeating: Future•  Game of Thrones Linear customers produce ripple-effect heartbeating–  Due to heartbeats fixed to a 10 minute period

•  In future, we will randomise the first heartbeat period in attempt to smooth out these ripples

H C E P V I M

29

{ “playouts”: [] }

Concurrency: Introduction•  Concurrency of 2 streams is managed through the concept of Playout Slots

•  A playout slot keeps track of a currently playing stream

•  Slots are allocated on playout initiation

NOW TV

Mongo DB

C E P V I MH

30

{ “playouts”: [] }

Concurrency: Introduction•  Concurrency of 2 streams is managed through the concept of Playout Slots

•  A playout slot keeps track of a currently playing stream

•  Slots are allocated on playout initiation

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

NOW TV

Mongo DB

Play

C E P V I MH

31

{ “playouts”: [] }

Concurrency: Introduction•  Concurrency of 2 streams is managed through the concept of Playout Slots

•  A playout slot keeps track of a currently playing stream

•  Slots are allocated on playout initiation

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

NOW TV

Mongo DB

Play

C E P V I MH

32

{ “playouts”: [] }

Concurrency: Introduction•  Concurrency of 2 streams is managed through the concept of Playout Slots

•  A playout slot keeps track of a currently playing stream

•  Slots are allocated on playout initiation

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

NOW TV

Mongo DB

Play

C E P V I MH

33

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

Concurrency: Introduction•  Slots are updated on heartbeats to refresh the time stamp

•  Slots are terminated on an END event

NOW TV

Mongo DB

C E P V I MH

34

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

Concurrency: Introduction•  Slots are updated on heartbeats to refresh the time stamp

•  Slots are terminated on an END event

{ “playouts”: [ { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

NOW TV

Mongo DB

END

C E P V I MH

35

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

Concurrency: Introduction•  Slots are updated on heartbeats to refresh the time stamp

•  Slots are terminated on an END event

{ “playouts”: [ { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

NOW TV

Mongo DB

Play

{ “playouts”: [ { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “CBF789”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

C E P V I MH

36

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

Concurrency: Previously•  Failure to receive an END event (due to app crash or connectivity loss), blocked a slot until timeout

•  Previously, this blocked subsequent playouts for up to 10 minutes

•  “Concurrency limit reached” errors were seen after our service had been restored on GoT night

NOW TV

Mongo DB

C E P V I MH

37

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

Concurrency: Previously•  Failure to receive an END event (due to app crash or connectivity loss), blocked a slot until timeout

•  Previously, this blocked subsequent playouts for up to 10 minutes

•  “Concurrency limit reached” errors were seen after our service had been restored on GoT night

NOW TV

Mongo DB

C E P V I MH

38

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” } ] }

Concurrency: Previously•  Failure to receive an END event (due to app crash or connectivity loss), blocked a slot until timeout

•  Previously, this blocked subsequent playouts for up to 10 minutes

•  “Concurrency limit reached” errors were seen after our service had been restored on GoT night

NOW TV

Mongo DB

Play

C E P V I MH

39

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] }

Concurrency: Today•  Now, slots allocated to the same Device ID can be ‘reclaimed’

•  No more “Concurrency limit reached” errors following app crashes or service outages

NOW TV

Mongo DB

box1

box2

C E P V I MH

40

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] }

Concurrency: Today•  Now, slots allocated to the same Device ID can be ‘reclaimed’

•  No more “Concurrency limit reached” errors following app crashes or service outages

NOW TV

Mongo DB

box1

box2

C E P V I MH

41

{ “playouts”: [ { “id” : “ABC123”, “heartbeat”: “<timestamp>”, “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] }

Concurrency: Today•  Now, slots allocated to the same Device ID can be ‘reclaimed’

•  No more “Concurrency limit reached” errors following app crashes or service outages

NOW TV

Mongo DB

Play{ “playouts”: [ { “id” : “FCE987”, “heartbeat”: “<timestamp>”, “content”: “<content_id>”, “deviceId” : “box1” }, { “id” : “DEF456”, “heartbeat”: “<timestamp>”, “content”: “<content_id>” , “deviceId” : “box2” } ] }

box1

box2

C E P V I MH

42

Entitlements: Introduction

•  Entitlement is granted based upon the products purchased and the content being consumed

•  Products and content are tagged with entitlement tags

•  Tag intersection indicates entitlement to consume

tag: sports tag: entertainment

E P V I MH C

tag: entertainment

tag: movies

tag: sports

43

Entitlements: Previously

•  Entitlement checking was not efficient – checked by content ID­  /entitlement/movie/<id>

­  /entitlement/episode/<id>

­  /entitlement/stream/<id>

•  Entitlement was checked on every details page before any call-to-action

•  Content tags almost never changed

E P V I MH C

44

Entitlements: Today

•  Entitlement checking by tag(s) was introduced–  /entitlement/tags/movies

•  Entitlement checking now only needed to occur once per collection or ‘section’ of the app

•  Where entitlement checking by content ID is still necessary–  tags are cached in memory

E P V I MH C

45

Product Storage: Previously•  Every purchase and renewal of any product resulted in a new Product entity in Mongo DB

Entertainment – June 2015

Movies – August 2015

Sports – 20th July 2015

Entertainment – July 2015

Entertainment – August 2015

Movies – September 2015

Entertainment – September 2015

Sports – 12th September 2015

Movies – October 2015

Entertainment – October 2015

Movies – November 2015

Entertainment – November 2015

P V I MH C E

46

Product Storage: Today•  We store entitlement entities instead of products, updating on renewals rather than duplicating

Entertainment – June 2015

Movies – August 2015

Sports – 20th July 2015

Entertainment – July 2015 Entertainment – August 2015

Movies – September 2015

Entertainment – September 2015

Sports – 12th September 2015

Movies – October 2015

Entertainment – October 2015

Movies – November 2015

Entertainment – November 2015

P V I MH C E

47

Viewings & Bookmarks: Introduction

•  Viewing a VOD asset => Viewing

•  Heartbeating during a VOD asset => Bookmark

•  Viewings and Bookmarks were stored separately

•  No capping or archiving

V I MH C E P

48

Viewings & Bookmarks: Previously•  Upon fetching a customer’s viewing history, multiple database queries were made:

­  1 query to the viewings collection to fetch n viewings for the customer

­  n queries to the bookmarks collection to fetch the bookmark position for each viewing

­  TOTAL: n + 1 Mongo DB queries for a single request!

­  Some customers had thousands of items in their viewing history!

{ “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1”, “timestamp”: “<timestamp>” }

{ “_id”: “bcd345”, “accountId”: “account1”, “contentId”: “movie2”, “timestamp”: “<timestamp>” }

{ “_id”: “cde456”, “accountId”: “account1”, “contentId”: “episode1”, “timestamp”: “<timestamp>” }

Viewings

{ “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1”, “position”: 1187 }

{ “_id”: “edc765”, “accountId”: “account1”, “contentId”: “movie2”, “position”: 2854 }

{ “_id”: “dcb543”, “accountId”: “account1”, “contentId”: “episode1”, “position”: 3542 }

Bookmarks}

V I MH C E P

49

Viewings & Bookmarks: Today

•  The original reason for keeping viewings and bookmarks separate was no longer apparent

•  Now, viewings and bookmarks are merged

–  Unnecessary document ID replaced with compound ID – improving indexing efficiency

–  Shortened field names - reducing storage consumption and further improving indexing efficiency

{ “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1”, “timestamp”: “<timestamp>” }

Viewing

{ “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1”, “position”: 1187 }

Bookmark

{ “_id”: { “accountId”: “account1”, “contentId”: “movie1” }, “position”: 1187, “timestamp”: “<timestamp>” }

View History

{ “_id”: { “aid”: “account1”, “cid”: “movie1” }, “pos”: 1187, “ts”: “<timestamp>” }

V I MH C E P

50

Mongo Indexes

{ “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1”, “timestamp”: “<timestamp>” }

{ “_id”: “abc123”, “accountId”: “account1” }

{ “_id”: “abc123”, “accountId”: “account1”, “timestamp”: “<timestamp>” }

{ “_id”: “abc123”, “accountId”: “account1”, “contentId”: “movie1” }

{ “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1”, “position”: 1187 }

{ “_id”: “fed987”, “accountId”: “account1”, “contentId”: “movie1” }

{ “_id”: { “aid”: “account1”, “cid”: “movie1” }, “pos”: 1187, “ts”: “<timestamp>” }

{ “_id.aid”: “account1”, “ts”: “<timestamp>” }

{ “_id”: “abc123” }

{ “_id”: “fed987” }

{ “_id”: { “aid”: “account1”, “cid”: “movie1” } }

Viewing Bookmark View History

I MH C E P V

51

Mongo Instance

Database 1

Collection 1

Mongo Write Locks: Previously

Document Collection 2

Document Document Document

Document Document Document Document

Database 2

Collection 3

Document Document

Collection 4

Document Document Document Document

MH C E P V I

52

Mongo Instance

Database 4

Database 2

Database 1

Mongo Write Locks: Today

Collection 2

Document Document Document

Collection 1

Document Document Document Document Document

Database 3

Collection 3

Document Document

Collection 4

Document Document Document Document

MH C E P V I

53

Performance Testing

Game of Thrones 2014 Load Capacity Tests March 2015

54

NOW TV Customer Base 2014 - 2015•  Our customer base TRIPLED, again, in the year up to April 2015

2013 2014 2015

55

NOW TV and Linear Streaming: The unpredictable scalability challengeTom Maule – NOW TV Solution

Architect

56

What happened?•  Good platform availability throughout

•  2.5x the load that affected us just one year earlier

•  Twice the normal concurrency for a typical Monday night

57

What did our customers say?

58

Were there any issues?

•  #bufferingBoobs

59

Recognition

MongoDB Innovation Award 2015 recognises organisations that are creating ground-breaking applications. These projects represent the best and most innovative work in the industry over the last year.

DTG Innovation Award 2015 recognises organisations

which have driven innovation in a particular technology or

sector

60

What’s Next For NOW TV?

•  Our growth is expected to continue along the same trajectory

•  Moving to active-active datacentre architecture for increased resiliency

•  Cloud-based ‘overflow’ scaling for high-load events

•  Microservices

•  Sub-system resiliency

61

Credits•  The entire NOW TV Technology team

are credited with our success

–  Platform Software Engineers

–  Platform Quality Assurance Engineers

–  Dev-Ops Engineers

–  App Developers & Testers

–  Analysts, scrum masters and management

•  Be a part of our future success, work for NOW TV at Sky

–  www.workforsky.com

–  @workforsky

Thank you. Any questions?

@tommaule

[email protected]