Gluecon 2013 netflix api crash course

60
Netflix API Crash Course Building & Running the API in 30 minutes Ben Schmaus, Netflix May 2013, Gluecon [email protected] @schmaus

description

Presentation from Gluecon 2013 on building and running the Netflix API.

Transcript of Gluecon 2013 netflix api crash course

Page 1: Gluecon 2013   netflix api crash course

Netflix API Crash CourseBuilding & Running the API in 30 minutes

Ben Schmaus, NetflixMay 2013, Gluecon

[email protected]@schmaus

Page 2: Gluecon 2013   netflix api crash course

Streaming TV Shows & Movies Globally

Page 3: Gluecon 2013   netflix api crash course

> 1000 Devices

Page 4: Gluecon 2013   netflix api crash course

1/3 ofInternet at peak

Page 5: Gluecon 2013   netflix api crash course

Programmer not Distributor

Page 6: Gluecon 2013   netflix api crash course

More than 36 million subscribers in over

40 countries

Page 7: Gluecon 2013   netflix api crash course

How does the API fit into the picture?

Page 8: Gluecon 2013   netflix api crash course

PersonalizationEngine User Info Movie

Metadata Ratings SimilarMovies

InstantQueue

A/B TestEngine

API

Page 9: Gluecon 2013   netflix api crash course

PersonalizationEngine User Info Movie

Metadata Ratings SimilarMovies

InstantQueue

A/B TestEngine

APIEnable UX Innovation

Insulate from Failure

Page 10: Gluecon 2013   netflix api crash course

> 2 Billion Requests per Day

Page 11: Gluecon 2013   netflix api crash course

Growth Over Time

Page 12: Gluecon 2013   netflix api crash course
Page 13: Gluecon 2013   netflix api crash course
Page 14: Gluecon 2013   netflix api crash course
Page 15: Gluecon 2013   netflix api crash course
Page 16: Gluecon 2013   netflix api crash course
Page 17: Gluecon 2013   netflix api crash course
Page 18: Gluecon 2013   netflix api crash course
Page 19: Gluecon 2013   netflix api crash course
Page 20: Gluecon 2013   netflix api crash course
Page 21: Gluecon 2013   netflix api crash course
Page 22: Gluecon 2013   netflix api crash course
Page 23: Gluecon 2013   netflix api crash course

Automation

Visibility

Operational awareness

Balance speed& quality

Page 24: Gluecon 2013   netflix api crash course

How's the APIput together?

Page 25: Gluecon 2013   netflix api crash course

ELB RoutingCluster

Mid-tier Services

Backend App

Cluster

Backend App

Cluster

+

API Layer

Page 26: Gluecon 2013   netflix api crash course

ELB RoutingCluster

Mid-tier Services

Backend App

Cluster

Backend App

Cluster

+

API Layer

Page 27: Gluecon 2013   netflix api crash course

Inside an API

App Server

RxJava

Hystrix

Service Client 1 Service Client 2 Service Client N

Page 28: Gluecon 2013   netflix api crash course

HystrixRx+Java Service Layer

Service Client(provided JAR)

ApplicationService

/device/endpoint(provided script)

Service

UI Teams

Mid-tierService Teams

API Team

Page 29: Gluecon 2013   netflix api crash course

Continually changing UI scripts and mid-tier services

Functionality, resiliency and performance drifts over time

Page 30: Gluecon 2013   netflix api crash course

Deployment & Ops

Page 31: Gluecon 2013   netflix api crash course

REMOVE MANUAL WORK pushing code to multiple AWS regions/clusters

ENABLE RAPID DEPLOYMENT of code despite limited visibility into how it's

changed

KEEP TEAM INFORMED about what's happening in prod

MITIGATE RISK of systemic failure

Page 32: Gluecon 2013   netflix api crash course

Tools

Page 33: Gluecon 2013   netflix api crash course

End-to-end Traceability Using Python/Java Glue

Page 34: Gluecon 2013   netflix api crash course

Code Flow

Page 35: Gluecon 2013   netflix api crash course

Run 1% of your traffic on the new code and see how it does

Page 36: Gluecon 2013   netflix api crash course

API ami-123 API ami-456

2xx4xx5xx

latencybusy threads

load...

Page 37: Gluecon 2013   netflix api crash course

Manually looking at graphs and SSH-ing into servers and grep-ing logs

doesn't scale(although we used to do that)

Page 38: Gluecon 2013   netflix api crash course

Confidence score for each AMI based on comparison of 1000+ metrics

Page 39: Gluecon 2013   netflix api crash course

Scannable visualization of metric space

More important

Less important

Page 40: Gluecon 2013   netflix api crash course

Cross-reference Jira, Link to code diffs

Page 41: Gluecon 2013   netflix api crash course

Track lib changes

Page 42: Gluecon 2013   netflix api crash course

Easy to access report artifacts for each AMI

Page 43: Gluecon 2013   netflix api crash course

Your basic red/black push

Page 44: Gluecon 2013   netflix api crash course
Page 45: Gluecon 2013   netflix api crash course
Page 46: Gluecon 2013   netflix api crash course
Page 47: Gluecon 2013   netflix api crash course
Page 48: Gluecon 2013   netflix api crash course

Doing red/black by hand for multiple clusters across multiple regions is

not fun

Page 49: Gluecon 2013   netflix api crash course

Automate multi-cluster/region pushes

Page 50: Gluecon 2013   netflix api crash course

Automate multi-cluster/region pushes

Don't forget to automate

rollbacks, too!

Page 51: Gluecon 2013   netflix api crash course

$Who, $What, $Where, $When

e.g., "bschmaus, ami-123, Sandbox Canary, 2013-05-06 19:05"

Latest prod change in chat topic

Page 52: Gluecon 2013   netflix api crash course

Quickly see status of all clusters in a region

Page 53: Gluecon 2013   netflix api crash course

What the #%*! just happened!?

Page 54: Gluecon 2013   netflix api crash course

Historical & realtime metrics, sort realtime by error/request rate

Page 55: Gluecon 2013   netflix api crash course

Distributed grep + tail

2013-05-09.20:38:54 MX 200 us-east-1c i-1824cb73 i-1c61b77f prod NFPS3-001-8G50FJCX... 288404769389848058 90ms api-global.netflix.com GET /tvui/release/470/plus/pathEvaluator -amazon.ami-id: ami-502eb039amazon.availability-zone: us-east-1camazon.instance-id: i-1824cb73amazon.instance-type: m2.2xlargeamazon.local-ipv4: 10.6.213.112amazon.public-hostname: ec2-54-243-4-69.compute-1.amazonaws.comamazon.public-ipv4: 54.243.4.69cookie_esn: NFPS3-001-8G50FJCX...country: MXcurrentTime: 1368131934468duration-millis: 90esn: NFPS3-001-8G50FJCX...geo.city: CIUDADOBREGON...

$ ./simple_stream.py -f -q 'e["country"]=="MX" && e["esn"]==~/NFPS3.*/' -r us

Page 56: Gluecon 2013   netflix api crash course

Go for haystack handing you the needle

Page 57: Gluecon 2013   netflix api crash course

Or at least be able to make smaller haystacks

Page 58: Gluecon 2013   netflix api crash course

Continuously experiment to make hard things easier

Page 59: Gluecon 2013   netflix api crash course

Even with the best tools, building software is hard work.

Great engineers build great software.

Page 60: Gluecon 2013   netflix api crash course

Want to help us build the API?

[email protected]@schmaus