Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016
-
Upload
coburn-watson -
Category
Technology
-
view
220 -
download
0
Transcript of Santa Cloud: How Netflix Does Holiday Capacity Planning - South Bay SRE Meetup Aug-9-2016
Netflix Confidential
Santa Cloud: How Netflix Does Holiday Capacity Planning
August 9th, 2016
Netflix Confidential
Netflix Confidential
Netflix Confidential
● Additional complexities
○ Volatile customer traffic
○ Option to do regional failover (Chaos Kong)
○ No budgeting process
● Can’t run too lean or too wasteful
Capacity Tightrope Walking
● Holiday season = “let’s not get fired” season
Netflix Confidential
● Charter: “Ensure availability of cloud capacity in
an efficient manner, allowing engineering
organizations to further prioritize innovation
and availability.”
● Cross-functional team amongst:
○ Engineering
○ Data Science
○ Finance
Cloud Capacity Planning
Netflix Confidential
● Match methodology for the environment
○ Bottoms-up
○ Tops-down
○ Highly iterative
● Engagement with the largest service teams
● Evaluate migrations and coordinate changes
Holiday Preparation
Netflix Confidential
● One of our largest services changing hardware type
● Service profile:
○ ~20% of our total footprint
○ Autoscales ~65% in the course of a day
○ Runs in all regions
○ CPU bound, memory intensive
Case Study: API Service
Netflix Confidential
Changing The Trough
Reservation line
Netflix Confidential
Managing Failover Capacity
● Evolution of the Chaos Kong
● Cascading failovers
● Who gets capacity?
Netflix Confidential
● Through detailed planning:
○ Maximize trough borrowing
○ Highest chance of capacity availability
● Charter: “Ensure availability of cloud capacity in
an efficient manner, allowing engineering
organizations to further prioritize innovation
and availability.”
Our Role In The API Migration
● “Rinse-repeat” for the other large service teams
Netflix Confidential
Netflix Confidential
Questions