Scaling the Netflix API - From Atlassian Dev Den
-
Upload
daniel-jacobson -
Category
Technology
-
view
3.193 -
download
2
description
Transcript of Scaling the Netflix API - From Atlassian Dev Den
Scaling the Netflix API
Daniel Jacobson@daniel_jacobson
http://www.linkedin.com/in/danieljacobsonhttp://www.slideshare.net/danieljacobson
Please read the notes associated with each slide for
the full context of the presentation
What do I mean by “scale”?
But There Are Many Ways to Scale!
OrganizationSystems
Devices
Development
Testing
But first, some background…
Global Streaming Videofor TV Shows and Movies
More than 36 Million Subscribers
More than 40 Countries
Netflix Accounts for 33% of Peak Internet Traffic in North America
Netflix subscribers are watching more than 1 billion hours a month
2007
Netflix REST API:One-Size-Fits-All (OSFA)
Solution
Image courtesy of Jay Mac 3 on Flickr
Netflix API Requests by AudienceAt Launch In 2008
External Developers
Image courtesy of Jay Mac 3 on Flickr
Netflix API Requests by AudienceFrom 2011
External Developers
Global Streaming Product
Three aspects of the Streaming Product:• Discovery• Sign-Up• Streaming
Member Sign-Up
Discovery
Discovery
Today, Netflix API Supports Discovery and Sign-Up
But Soon, Will Support Streaming
Scaling…
OrganizationSystems
Devices
Development
Testing
Distributed Architecture
1000+ Device Types
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies Reviews A/B Test
Engine
Dozens of Dependencies
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
http://www.slideshare.net/reed2001/culture-1798664
Scaling…
OrganizationSystems
Devices
Development
Testing
System Resiliency
Distributed Architecture
Dependency Relationships
2,000,000,000Requests Per Day to the
Netflix API
30Distinct Dependent
Services for the Netflix API
14,000,000,000Netflix API Calls Per Day to those Dependent Services
0Dependent Services with
100% SLA
99.99% = 99.7%30
0.3% of 2B = 6M failures per day
2+ Hours of Downtime Per Month
99.99% = 99.7%30
0.3% of 2B = 6M failures per day
2+ Hours of Downtime Per Month
99.9% = 97%30
3% of 2B = 60M failures per day
20+ Hours of Downtime Per Month
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Circuit Breaker Dashboard
Call Volume and Health / Last 10 Seconds
Call Volume / Last 2 Minutes
Successful Requests
Successful, But Slower Than Expected
Short-Circuited Requests, Delivering Fallbacks
Timeouts, Delivering Fallbacks
Thread Pool & Task Queue Full, Delivering Fallbacks
Exceptions, Delivering Fallbacks
Error Rate# + # + # + # / (# + # + # + # + #) = Error Rate
Status of Fallback Circuit
Requests per Second, Over Last 10 Seconds
SLA Information
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Fallback
Personalization
EngineUser Info Movie
MetadataMovie Ratings
Similar Movies
API
Reviews A/B Test Engine
Fallback
System Infrastructure
AWS Cloud
Autoscaling
Autoscaling
Forced Failure
Global System
More than 36 Million Subscribers
More than 40 Countries
ZuulGatekeeper for the Netflix Streaming Application
Zuul
• Multi-Region Resiliency
• Insights• Stress Testing• Canary Testing• Dynamic Routing
• Load Shedding• Security• Static Response
Handling• Authentication
Isthmus
Scaling…
OrganizationSystems
Devices
Development
Testing
Screen Real Estate
Controller
Technical Capabilities
One-Size-Fits-AllAPI
Request
RequestRequest
Request
Request
Request
RequestRequest
Request
Request
RequestRequest
Request
Request
Request
Request
Scaling…
OrganizationSystems
Devices
Development
Testing
Courtesy of South Florida Classical Review
Resource-Based API
vs.
Experience-Based API
Resource-Based Requests
• /users/<id>/ratings/title• /users/<id>/queues• /users/<id>/queues/instant• /users/<id>/recommendations• /catalog/titles/movie• /catalog/titles/series• /catalog/people
REST API
RECOMMENDATIONS
MOVIE DATA
SIMILAR MOVIES
AUTH MEMBERDATA
A/B TESTS
START-UP
RATINGS
Network Border Network Border
RECOMMENDATIONS
MOVIE DATA
SIMILAR MOVIES
AUTH MEMBERDATA
A/B TESTS
START-UP
RATINGS
OSFA API
Network Border Network Border
SERVER CODE
CLIENT CODE
RECOMMENDATIONS
MOVIE DATA
SIMILAR MOVIES
AUTH MEMBERDATA
A/B TESTS
START-UP
RATINGS
OSFA API
Network Border Network Border
DATA GATHERING,FORMATTING,AND DELIVERY
USER INTERFACERENDERING
Experience-Based Requests
• /ps3/homescreen
JAVA API
Network Border Network Border
RECOMMENDATIONS
MOVIE DATA
SIMILAR MOVIES
AUTH MEMBERDATA
A/B TESTS
START-UP
RATINGS
Groovy Layer
RECOMMENDATIONSA
ZXSXX C CCC
MOVIE DATA
SIMILAR MOVIES
AUTH MEMBERDATA
A/B TESTS
START-UP
RATINGS
JAVA API
SERVER CODE
CLIENT CODE
CLIENT ADAPTER CODE(WRITTEN BY CLIENT TEAMS, DYNAMICALLY UPLOADED TO SERVER)
Network Border Network Border
RECOMMENDATIONSA
ZXSXX C CCC
MOVIE DATA
SIMILAR MOVIES
AUTH MEMBERDATA
A/B TESTS
START-UP
RATINGS
JAVA API
DATA GATHERING
DATA FORMATTINGAND DELIVERY
USER INTERFACERENDERING
Network Border Network Border
Scaling…
OrganizationSystems
Devices
Development
Testing
Dependency Relationships
Testing Philosophy:
Act Fast, React Fast
That Doesn’t Mean We Don’t Test
• Unit tests
• Functional tests
• Regression scripts
• Continuous integration
• Capacity planning
• Load / Performance tests
Cloud-Based Deployment Techniques
Current Code
In Production
API Requests from the Internet
Single Canary InstanceTo Test New Code with Production Traffic
(around 1% or less of traffic)
Current Code
In Production
API Requests from the Internet
Error!
Current Code
In Production
API Requests from the Internet
Current Code
In Production
API Requests from the Internet
Perfect!
Current Code
In Production
API Requests from the Internet
New Code
Getting Prepared for Production
Current Code
In Production
API Requests from the Internet
New Code
Getting Prepared for Production
Error!
Current Code
In Production
API Requests from the Internet
New Code
Getting Prepared for Production
Current Code
In Production
API Requests from the Internet
New Code
Getting Prepared for Production
Current Code
In Production
API Requests from the Internet
Perfect!
Current Code
In Production
API Requests from the Internet
New Code
Getting Prepared for Production
Current Code
In Production
API Requests from the Internet
New Code
Getting Prepared for Production
API Requests from the Internet
New Code
Getting Prepared for Production
https://www.github.com/Netflix
Scaling the Netflix API
Daniel Jacobson@daniel_jacobson
http://www.linkedin.com/in/danieljacobsonhttp://www.slideshare.net/danieljacobson
HelpWanted!