How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers...

83
How we scaled push messaging for millions of Netflix devices Susheel Aroskar Cloud Gateway

Transcript of How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers...

Page 1: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How we scaled push messaging for millions of Netflix devices

Susheel AroskarCloud Gateway

Page 2: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Why do we need push?

Page 3: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of
Page 4: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How I spend my time in Netflix application...

Page 5: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● What is push?

Page 6: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● What is push?● How you can build it

Page 7: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● What is push?● How you can build it● How you can operate it

Page 8: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● What is push?● How you can build it● How you can operate it● What can you do with it

Page 9: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Susheel Aroskar

Senior Software EngineerCloud Gateway

[email protected]

github.com/raksoras @susheelaroskar

Page 10: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

PERSISTUNTILSOMETHINGHAPPENS

Page 11: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

PERSISTUNTILSOMETHINGHAPPENS

Page 12: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Zuul Push Architecture

Page 13: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Zuul Push Servers

Page 14: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Zuul Push Servers

WebSockets / SSE

Page 15: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 16: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 17: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push Library

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 18: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 19: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processor

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 20: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processor

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

WebSockets / SSE

Page 21: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processor

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

Lookup server

WebSockets / SSE

Page 22: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processor

Push Library

Push Message Queue

Push Registry

Zuul Push Servers

Register User

Lookup server

Deliver message

WebSockets / SSE

Page 23: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Handling millions of persistent connections

Zuul Push server

Page 24: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

C10K challenge

Page 25: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Socket Socket

Thread per Connection

Thread-1 Thread-2

Read

WriteWrite

Read

Page 26: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Socket Socket

Thread per Connection

Thread-1 Thread-2

Read

WriteWrite

Read

Async I/O

Socket

read callback

write callback

Socket

Single Threadread

callbackwrite

callback

Page 27: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

SOCKET

ChannelInboundHandler

ChannelInboundHandler

ChannelOutboundHandler

ChannelOutboundHandler

Channel Pipeline

Head Tail

Netty

Page 28: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

protected void addPushHandlers(ChannelPipeline pl) {

pl.addLast(new HttpServerCodec());

pl.addLast(new HttpObjectAggregator());

pl.addLast(getPushAuthHandler());

pl.addLast(new WebSocketServerCompressionHandler());

pl.addLast(new WebSocketServerProtocolHandler());

pl.addLast(getPushRegistrationHandler());

}

Page 29: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Authenticate by Cookies, JWT or any other custom scheme

Plug in your custom authentication policy

Page 30: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Tracking clients’ connectionMetadata in real-time

Push Registry

Page 31: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

public class MyRegistration extends PushRegistrationHandler {

@Override

protected void registerClient(

ChannelHandlerContext ctx,

PushUserAuth auth,

PushConnection conn,

PushConnectionRegistry registry) {

super.registerClient(ctx, authEvent, conn, registry);

ctx.executor().submit(() -> storeInRedis(auth));

}

}

Page 32: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Push registry features checklist

Page 33: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● Low read latency

Push registry features checklist

Page 34: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● Low read latency● Record expiry

Push registry features checklist

Page 35: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● Low read latency● Record expiry● Sharding

Push registry features checklist

Page 36: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

● Low read latency● Record expiry● Sharding● Replication

Push registry features checklist

Page 37: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of
Page 38: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

What we use

https://github.com/Netflix/dynomite

Redis + Auto-sharding+ Read/Write quorum+ Cross-region replication

Dynomite

Page 39: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Message Processing

Queue, RouteDeliver

Page 40: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

We use Kafka message queues to decouple message senders from receivers

Page 41: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Fire and Forget

Page 42: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Cross-region Replication

Page 43: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Different queues for different priorities

Page 44: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

We run multiple message processor instances in parallel to scale our message processing throughput.

Page 45: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Operating Zuul Push Different than REST of them

Page 46: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Persistent connections make Zuul Push server stateful

Long lived stable connections

Page 47: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Persistent connections make Zuul Push server stateful

Long lived stable connections○ Great for client efficiency

Page 48: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Persistent connections make Zuul Push server stateful

Long lived stable connections○ Great for client efficiency○ Terrible for quick deploy/rollback

Page 49: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

If you love your clients set them free...

Tear down connections periodically

Page 50: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Randomize each connection’s lifetime

Page 51: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

#

reco

nnec

ts

Time

Effect of randomizing connection lifetime on reconnect peaks

Page 52: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Ask client to close its connection.

Page 53: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Most connections are idle!

How to optimize push server

Page 54: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

BIG Server, tons of connections

ulimit -n 262144

net.ipv4.tcp_rmem="4096 87380

16777216"

net.ipv4.tcp_wmem="4096 87380

16777216"

Page 55: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of
Page 56: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of
Page 57: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Goldilocks strategy

Page 58: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Optimize for cost, NOT instance count

$$ $$

Page 59: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How to auto-scale?

Page 60: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How to auto-scale?

RPS? CPU??

Page 61: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

How to auto-scale?

RPS? CPU??

Open Connections

Page 62: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Amazon Elastic Load Balancers cannot proxy WebSockets.

Page 63: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Solution - Run ELB as a TCP load balancer

7 Application

6 Presentation

5 Session

4 Transport

3 Network

2 Data link

1 Physical

HTTP

TCP

IP

Ethernet

OSI 7 network layers (conceptual)

HTTP over TCP/IP

Layer 7 HTTP (WebSocket Upgrade Request)

Layer 4 TCP

Page 64: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes

Page 65: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes● Randomize each connection’s lifetime

Page 66: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes● Randomize connection’s lifetime● More number of smaller servers >> few BIG servers

Page 67: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes● Randomize connection’s lifetime● More number of smaller servers >> few BIG servers● Auto-scale on number of open connections per box

Page 68: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Managing push cluster - a quick recap

● Recycle connections after tens of minutes● Randomize connection’s lifetime● More number of smaller servers >> few BIG servers● Auto-scale on number of open connections per box● WebSocket aware vs TCP load balancer

Page 69: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

If you build it,They will push

Page 70: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

On-demand diagnostics

Page 71: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Remote recovery

Page 72: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

User messaging

Page 73: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

WHAT WILL YOU

USE IT FOR?

Page 74: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Call to action

Page 75: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

PULL!

Page 76: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

PULL!

https://github.com/Netflix/zuul

Page 77: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

In conclusion, push can make you

Page 78: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

In conclusion, push can make you rich (in functionality),

Page 79: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

In conclusion, push can make you rich (in functionality), thin (by getting rid of polling)

Page 80: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

In conclusion, push can make you rich (in functionality), thin (by getting rid of polling) and happy!

Page 81: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Thank you.

Page 82: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Questions? Susheel Aroskar

Senior Software EngineerCloud Gateway

[email protected]

github.com/raksoras@susheelaroskar

Page 83: How we scaled push messaging for millions of Netflix devices · Push Registry Zuul Push Servers Register User Lookup server Deliver message WebSockets / SSE . Handling millions of

Rich, exciting Apps

More efficient systems

Easy to customize

Easy to operate

Zuul Push

Battle tested