Yow Conference Dec 2013 Netflix Workshop Slides with Notes
-
date post
16-Sep-2014 -
Category
Technology
-
view
15 -
download
1
description
Transcript of Yow Conference Dec 2013 Netflix Workshop Slides with Notes
![Page 1: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/1.jpg)
Patterns for Continuous Delivery, High Availability, DevOps & Cloud
Native Open Source with NetflixOSS
Workshop with NotesDecember 2013Adrian Cockcroft@adrianco @NetflixOSS
![Page 2: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/2.jpg)
Presentation vs. Workshop
• Presentation– Short duration, focused subject– One presenter to many anonymous audience– A few questions at the end
• Workshop– Time to explore in and around the subject– Tutor gets to know the audience– Discussion, rat-holes, “bring out your dead”
![Page 3: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/3.jpg)
Presenter
Adrian Cockcroft• Technology Fellow
– From 2014 Battery Ventures
• Cloud Architect– From 2007-2013 Netflix
• eBay Research Labs– From 2004-2007
• Sun Microsystems– HPC Architect– Distinguished Engineer– Author of four books– Performance and Capacity
• BSc Physics and Electronics– City University, London
Biography
![Page 4: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/4.jpg)
Attendee Introductions
• Who are you, where do you work• Why are you here today, what do you need• “Bring out your dead”
– Do you have a specific problem or question?– One sentence elevator pitch
• What instrument do you play?
![Page 5: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/5.jpg)
Content
Cloud at Scale with Netflix
Cloud Native NetflixOSS
Resilient Developer Patterns
Availability and Efficiency
Questions and Discussion
![Page 6: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/6.jpg)
Netflix Member Web Site Home PagePersonalization Driven – How Does It Work?
![Page 7: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/7.jpg)
How Netflix Used to Work
Customer Device (PC Web browser)
Monolithic Web App
Oracle
MySQL
Monolithic Streaming App
Oracle
MySQL
Limelight/Level 3 Akamai CDNs
Content Management
Content Encoding
Consumer Electronics
AWS Cloud Services
CDN Edge Locations
Datacenter
![Page 8: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/8.jpg)
How Netflix Streaming Works Today
Customer Device (PC, PS3, TV…)
Web Site or Discovery API
User Data
Personalization
Streaming API
DRM
QoS Logging
OpenConnect CDN Boxes
CDN Management and Steering
Content Encoding
Consumer Electronics
AWS Cloud Services
CDN Edge Locations
Datacenter
![Page 9: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/9.jpg)
![Page 10: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/10.jpg)
Netflix Scale
• Tens of thousands of instances on AWS– Typically 4 core, 30GByte, Java business logic– Thousands created/removed every day
• Thousands of Cassandra NoSQL nodes on AWS– Many hi1.4xl - 8 core, 60Gbyte, 2TByte of SSD– 65 different clusters, over 300TB data, triple zone– Over 40 are multi-region clusters (6, 9 or 12 zone)– Biggest 288 m2.4xl – over 300K rps, 1.3M wps
![Page 11: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/11.jpg)
Reactions over time
2009 “You guys are crazy! Can’t believe it”
2010 “What Netflix is doing won’t work”
2011 “It only works for ‘Unicorns’ like Netflix”
2012 “We’d like to do that but can’t”
2013 “We’re on our way using Netflix OSS code”
![Page 12: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/12.jpg)
Objectives:
ScalabilityAvailability
AgilityEfficiency
![Page 13: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/13.jpg)
Principles:
ImmutabilitySeparation of Concerns
Anti-fragilityHigh trust organization
Sharing
![Page 14: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/14.jpg)
Outcomes:• Public cloud – scalability, agility, sharing• Micro-services – separation of concerns• De-normalized data – separation of concerns• Chaos Engines – anti-fragile operations• Open source by default – agility, sharing• Continuous deployment – agility, immutability• DevOps – high trust organization, sharing• Run-what-you-wrote – anti-fragile development
![Page 15: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/15.jpg)
When to use public cloud?
![Page 16: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/16.jpg)
![Page 17: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/17.jpg)
"This is the IT swamp draining manual for anyone who is neck deep in alligators."- Adrian Cockcroft, Cloud Architect at Netflix
![Page 18: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/18.jpg)
Goal of Traditional IT:Reliable hardware
running stable software
![Page 19: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/19.jpg)
SCALEBreaks hardware
![Page 20: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/20.jpg)
….SPEEDBreaks software
![Page 21: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/21.jpg)
SPEED at SCALE
Breaks everything
![Page 22: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/22.jpg)
![Page 23: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/23.jpg)
Cloud Native
What is it?Why?
![Page 24: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/24.jpg)
Strive for perfection
Perfect codePerfect hardware
Perfectly operated
![Page 25: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/25.jpg)
But perfection takes too long
Compromises…Time to market vs. Quality
Utopia remains out of reach
![Page 26: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/26.jpg)
Where time to market wins big
Making a land-grabDisrupting competitors (OODA)
Anything delivered as web services
![Page 27: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/27.jpg)
Observe
Orient
Decide
Act
Land grab opportunity Competitive
move
Customer Pain Point
Analysis
Get buy-in
Plan response
Commit resources
Implement
Deliver
Engage customers
Model alternatives
BIG DATA
INNOVATION
CULTURE
CLOUD
Measure customers
Colonel Boyd, USAF
“Get inside your adversaries'
OODA loop to disorient them”
![Page 28: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/28.jpg)
How Soon?
Product features in days instead of monthsDeployment in minutes instead of weeks
Incident response in seconds instead of hours
![Page 29: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/29.jpg)
Cloud NativeA new engineering challenge
Construct a highly agile and highly available service from ephemeral and
assumed broken components
![Page 30: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/30.jpg)
Inspiration
![Page 31: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/31.jpg)
How to get to Cloud Native
Freedom and Responsibility for DevelopersDecentralize and Automate Ops Activities
Integrate DevOps into the Business Organization
Re-Org!
![Page 32: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/32.jpg)
Four Transitions
• Management: Integrated Roles in a Single Organization– Business, Development, Operations -> BusDevOps
• Developers: Denormalized Data – NoSQL– Decentralized, scalable, available, polyglot
• Responsibility from Ops to Dev: Continuous Delivery– Decentralized small daily production updates
• Responsibility from Ops to Dev: Agile Infrastructure - Cloud– Hardware in minutes, provisioned directly by developers
![Page 33: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/33.jpg)
The DIY Question
Why doesn’t Netflix build and run its own cloud?
![Page 34: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/34.jpg)
Fitting Into Public Scale
Public Grey Area Private
1,000 Instances 100,000 Instances
Netflix FacebookStartups
![Page 35: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/35.jpg)
How big is Public?
AWS upper bound estimate based on the number of public IP AddressesEvery provisioned instance gets a public IP by default (some VPC don’t)
AWS Maximum Possible Instance Count 5.1 Million – Sept 2013Growth >10x in Three Years, >2x Per Annum - http://bit.ly/awsiprange
![Page 36: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/36.jpg)
The Alternative Supplier Question
What if there is no clear leader for a feature, or AWS doesn’t have what
we need?
![Page 37: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/37.jpg)
Things We Don’t Use AWS For
SaaS Applications – Pagerduty, Onelogin etc.Content Delivery Service
DNS Service
![Page 38: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/38.jpg)
CDN Scale
AWS CloudFrontAkamai
LimelightLevel 3
Netflix Openconnect
YouTube
Gigabits Terabits
NetflixFacebookStartups
![Page 39: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/39.jpg)
Content Delivery ServiceOpen Source Hardware Design + FreeBSD, bird, nginx
see openconnect.netflix.com
![Page 40: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/40.jpg)
DNS Service
AWS Route53 is missing too many features (for now)Multiple vendor strategy Dyn, Ultra, Route53
Abstracted (broken) DNS APIs with Denominator
![Page 41: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/41.jpg)
What Changed?
Get out of the way of innovationBest of breed, by the hour
Choices based on scale
Cost reduction
Slow down developers
Less competitiveLess revenue
Lower margins
Process reduction
Speed up developers
More competitive
More revenue
Higher margins
![Page 42: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/42.jpg)
Getting to Cloud Native
![Page 43: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/43.jpg)
Congratulations, your startup got funding!
• More developers• More customers• Higher availability• Global distribution• No time….
Growth
![Page 44: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/44.jpg)
AWS Zone A
Your architecture looks like this:
Web UI / Front End API
Middle Tier
RDS/MySQL
![Page 45: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/45.jpg)
And it needs to look more like this…
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Regional Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Regional Load Balancers
![Page 46: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/46.jpg)
Inside each AWS zone:Micro-services and de-normalized data stores
API or Web Calls
memcached
Cassandra
Web service
S3 bucket
![Page 47: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/47.jpg)
We’re here to help you get to global scale…Apache Licensed Cloud Native OSS Platform
http://netflix.github.com
![Page 48: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/48.jpg)
Technical Indigestion – what do all these do?
![Page 49: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/49.jpg)
Updated site – make it easier to find what you need
![Page 50: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/50.jpg)
Getting started with NetflixOSS Step by Step
1. Set up AWS Accounts to get the foundation in place2. Security and access management setup3. Account Management: Asgard to deploy & Ice for cost monitoring4. Build Tools: Aminator to automate baking AMIs5. Service Registry and Searchable Account History: Eureka & Edda6. Configuration Management: Archaius dynamic property system7. Data storage: Cassandra, Astyanax, Priam, EVCache8. Dynamic traffic routing: Denominator, Zuul, Ribbon, Karyon9. Availability: Simian Army (Chaos Monkey), Hystrix, Turbine10. Developer productivity: Blitz4J, GCViz, Pytheas, RxJava11. Big Data: Genie for Hadoop PaaS, Lipstick visualizer for Pig12. Sample Apps to get started: RSS Reader, ACME Air, FluxCapacitor
![Page 51: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/51.jpg)
AWS Account Setup
![Page 52: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/52.jpg)
Flow of Code and Data Between AWS Accounts
ProductionAccount
Archive Account
AuditableAccount
Dev Test Build Account
AMI
AMI
Backup Data to S3
WeekendS3 restore
New Code
Backup Data to S3
![Page 53: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/53.jpg)
Account Security
• Protect Accounts– Two factor authentication for primary login
• Delegated Minimum Privilege– Create IAM roles for everything
• Security Groups– Control who can call your services
![Page 54: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/54.jpg)
Cloud Access Control
www-prod
• Userid wwwprod
Dal-prod
• Userid dalprod
Cass-prod
• Userid cassprod
Cloud access audit log ssh/sudo bastion
Security groups don’t allowssh between instances
Developers
![Page 55: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/55.jpg)
Tooling and Infrastructure
![Page 56: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/56.jpg)
Fast Start Amazon Machine Imageshttps://github.com/Answers4AWS/netflixoss-ansible/wiki/AMIs-for-NetflixOSS
• Pre-built AMIs for– Asgard – developer self service deployment console– Aminator – build system to bake code onto AMIs– Edda – historical configuration database– Eureka – service registry– Simian Army – Janitor Monkey, Chaos Monkey,
Conformity Monkey• NetflixOSS Cloud Prize Winner
– Produced by Answers4aws – Peter Sankauskas
![Page 57: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/57.jpg)
Fast Setup CloudFormation Templates
http://answersforaws.com/resources/netflixoss/cloudformation/
• CloudFormation templates for– Asgard – developer self service deployment console– Aminator – build system to bake code onto AMIs– Edda – historical configuration database– Eureka – service registry– Simian Army – Janitor Monkey for cleanup,
![Page 58: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/58.jpg)
CloudFormation Walk-Through for Asgard
(Repeat for Prod, Test and Audit Accounts)
![Page 59: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/59.jpg)
![Page 60: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/60.jpg)
Setting up Asgard – Step 1 Create New Stack
![Page 61: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/61.jpg)
Setting up Asgard – Step 2 Select Template
![Page 62: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/62.jpg)
Setting up Asgard – Step 3 Enter IP & Keys
![Page 63: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/63.jpg)
Setting up Asgard – Step 4 Skip Tags
![Page 64: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/64.jpg)
Setting up Asgard – Step 5 Confirm
![Page 65: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/65.jpg)
Setting up Asgard – Step 6 Watch CloudFormation
![Page 66: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/66.jpg)
Setting up Asgard – Step 7 Find PublicDNS Name
![Page 67: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/67.jpg)
Open Asgard – Step 8 Enter Credentials
![Page 68: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/68.jpg)
Use Asgard – AWS Self Service Portal
![Page 69: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/69.jpg)
Use Asgard - Manage Red/Black Deployments
![Page 70: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/70.jpg)
Track AWS Spend in Detail with ICE
![Page 71: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/71.jpg)
Ice – Slice and dice detailed costs and usage
![Page 72: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/72.jpg)
Setting up ICE
• Visit github site for instructions• Currently depends on HiCharts
– Non-open source package license– Free for non-commercial use– Download and license your own copy– We can’t provide a pre-built AMI – sorry!
• Long term plan to make ICE fully OSS– Anyone want to help?
![Page 73: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/73.jpg)
Build Pipeline Automation
Jenkins in the Cloud auto-builds NetflixOSS Pull Requestshttp://www.cloudbees.com/jenkins
![Page 74: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/74.jpg)
Automatically Baking AMIs with Aminator
• AutoScaleGroup instances should be identical• Base plus code/config• Immutable instances• Works for 1 or 1000… • Aminator Launch
– Use Asgard to start AMI or– CloudFormation Recipe
![Page 75: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/75.jpg)
Discovering your Services - Eureka
• Map applications by name to – AMI, instances, Zones– IP addresses, URLs, ports– Keep track of healthy, unhealthy and initializing
instances• Eureka Launch
– Use Asgard to launch AMI or use CloudFormation Template
![Page 76: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/76.jpg)
Deploying Eureka Service – 1 per Zone
![Page 77: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/77.jpg)
Edda
AWS Instances, ASGs, etc.
Eureka Services metadataYour Own
Custom State
Searchable state history for a Region / Account
Monkeys
Timestamped delta cache of JSON describe call results for anything of interest…
Edda LaunchUse Asgard to launch AMI oruse CloudFormation Template
![Page 78: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/78.jpg)
Edda Query ExamplesFind any instances that have ever had a specific public IP address$ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0"["i-0123456789","i-012345678a","i-012345678b”]
Show the most recent change to a security group$ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2"--- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810+++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504@@ -1,33 +1,33 @@ {… "ipRanges" : [ "10.10.1.1/32", "10.10.1.2/32",+ "10.10.1.3/32",- "10.10.1.4/32"… }
![Page 79: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/79.jpg)
Archaius – Property Console
![Page 80: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/80.jpg)
Archaius library – configuration management
SimpleDB or DynamoDB for NetflixOSS. Netflix uses Cassandra
for multi-region…
Based on Pytheas. Not open sourced yet
![Page 81: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/81.jpg)
Data Storage and Access
![Page 82: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/82.jpg)
Data Storage Options
• RDS for MySQL– Deploy using Asgard
• DynamoDB– Fast, easy to setup and scales up from a very low cost base
• Cassandra– Provides portability, multi-region support, very large scale– Storage model supports incremental/immutable backups– Priam: easy deploy automation for Cassandra on AWS
![Page 83: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/83.jpg)
Priam – Cassandra co-process
• Runs alongside Cassandra on each instance• Fully distributed, no central master coordination• S3 Based backup and recovery automation• Bootstrapping and automated token assignment.• Centralized configuration management• RESTful monitoring and metrics• Underlying config in SimpleDB
– Netflix uses Cassandra “turtle” for Multi-region
![Page 84: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/84.jpg)
Astyanax Cassandra Client for Java
• Features– Abstraction of connection pool from RPC protocol– Fluent Style API– Operation retry with backoff– Token aware– Batch manager– Many useful recipes– Entity Mapper based on JPA annotations
![Page 85: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/85.jpg)
Cassandra Astyanax Recipes
• Distributed row lock (without needing zookeeper)• Multi-region row lock• Uniqueness constraint• Multi-row uniqueness constraint• Chunked and multi-threaded large file storage• Reverse index search• All rows query• Durable message queue• Contributed: High cardinality reverse index
![Page 86: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/86.jpg)
EVCache - Low latency data access• multi-AZ and multi-Region replication• Ephemeral data, session state (sort of)• Client code• Memcached
![Page 87: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/87.jpg)
Routing Customers to Code
![Page 88: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/88.jpg)
Denominator: DNS for Multi-Region Availability
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
Denominator – manage traffic via multiple DNS providers with Java code
Regional Load Balancers Regional Load Balancers
UltraDNS DynECT DNS
AWS Route53
Denominator
Zuul API Router
![Page 89: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/89.jpg)
Zuul – Smart and Scalable Routing Layer
![Page 90: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/90.jpg)
Ribbon library for internal request routing
![Page 91: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/91.jpg)
Ribbon – Zone Aware LB
![Page 92: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/92.jpg)
Karyon - Common server container
• Bootstrappingo Dependency & Lifecycle management via Governator.o Service registry via Eureka.o Property management via Archaiuso Hooks for Latency Monkey testingo Preconfigured status page and heathcheck servlets
![Page 93: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/93.jpg)
• Embedded Status Page Consoleo Environmento Eurekao JMX
Karyon
![Page 94: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/94.jpg)
Availability
![Page 95: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/95.jpg)
Either you break it, or users will
![Page 96: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/96.jpg)
Add some Chaos to your system
![Page 97: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/97.jpg)
Clean up your room! – Janitor Monkey
Works with Edda history to clean up after Asgard
![Page 98: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/98.jpg)
Conformity MonkeyTrack and alert for old code versions and known issues
Walks Karyon status pages found via Edda
![Page 99: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/99.jpg)
Hystrix Circuit Breaker: Fail Fast -> recover fast
![Page 100: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/100.jpg)
Hystrix Circuit Breaker State Flow
![Page 101: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/101.jpg)
Turbine DashboardPer Second Update Circuit Breakers in a Web Browser
![Page 102: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/102.jpg)
Developer Productivity
![Page 103: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/103.jpg)
Blitz4J – Non-blocking Logging
• Better handling of log messages during storms• Replace sync with concurrent data structures.• Extreme configurability• Isolation of app threads from logging threads
![Page 104: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/104.jpg)
JVM Garbage Collection issues? GCViz!
• Convenient• Visual• Causation• Clarity• Iterative
![Page 105: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/105.jpg)
Pytheas – OSS based tooling framework
• Guice
• Jersey
• FreeMarker
• JQuery
• DataTables
• D3
• JQuery-UI
• Bootstrap
![Page 106: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/106.jpg)
RxJava - Functional Reactive Programming
• A Simpler Approach to Concurrency– Use Observable as a simple stable composable abstraction
• Observable Service Layer enables any of– conditionally return immediately from a cache– block instead of using threads if resources are constrained– use multiple threads– use non-blocking IO– migrate an underlying implementation from network
based to in-memory cache
![Page 107: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/107.jpg)
Big Data and Analytics
![Page 108: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/108.jpg)
Hadoop jobs - Genie
![Page 109: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/109.jpg)
Lipstick - Visualization for Pig queries
![Page 110: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/110.jpg)
Suro Event Pipeline
1.5 Million events/s80 Billion events/day
Cloud native, dynamic,configurable offline andrealtime data sinks
Error rate alerting
![Page 111: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/111.jpg)
Putting it all together…
![Page 112: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/112.jpg)
Sample Application – RSS Reader
![Page 113: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/113.jpg)
3rd Party Sample App by Chris Freglyfluxcapacitor.com
Flux Capacitor is a Java-based reference app using:archaius (zookeeper-based dynamic configuration)astyanax (cassandra client)blitz4j (asynchronous logging)curator (zookeeper client)eureka (discovery service)exhibitor (zookeeper administration)governator (guice-based DI extensions)hystrix (circuit breaker)karyon (common base web service)ribbon (eureka-based REST client)servo (metrics client)turbine (metrics aggregation)Flux also integrates popular open source tools such as Graphite, Jersey, Jetty, Netty, and Tomcat.
![Page 114: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/114.jpg)
3rd party Sample App by IBMhttps://github.com/aspyker/acmeair-netflix/
![Page 115: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/115.jpg)
NetflixOSS Project Categories
![Page 116: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/116.jpg)
GithubNetflixOSS
Source
AWSBase AMI
MavenCentral
Cloudbees Jenkins
AminatorBakery
DynaslaveAWS Build
Slaves
Asgard(+ Frigga)Console
AWSBaked AMIs
GlistenWorkflow DSL
AWS Account
NetflixOSS Continuous Build and Deployment
![Page 117: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/117.jpg)
AWS Account
Asgard Console
Multiple AWS Regions
Eureka Registry
3 AWS Zones
Application ClustersAutoscale Groups
Instances
NetflixOSS Services Scope
![Page 118: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/118.jpg)
•Baked AMI – Tomcat, Apache, your code•Governator – Guice based dependency injection•Archaius – dynamic configuration properties client•Eureka - service registration client
Initialization
•Karyon - Base Server for inbound requests•RxJava – Reactive pattern•Hystrix/Turbine – dependencies and real-time status•Ribbon and Feign - REST Clients for outbound calls
Service Requests
•Astyanax – Cassandra client and pattern library•Evcache – Zone aware Memcached client•Curator – Zookeeper patterns•Denominator – DNS routing abstraction
Data Access
•Blitz4j – non-blocking logging•Servo – metrics export for autoscaling•Atlas – high volume instrumentationLogging
NetflixOSS Instance Libraries
![Page 119: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/119.jpg)
•CassJmeter – Load testing for Cassandra•Circus Monkey – Test account reservation rebalancing
Test Tools
•Janitor Monkey – Cleans up unused resources•Efficiency Monkey•Doctor Monkey•Howler Monkey – Complains about AWS limits
Maintenance
•Chaos Monkey – Kills Instances•Chaos Gorilla – Kills Availability Zones•Chaos Kong – Kills Regions•Latency Monkey – Latency and error injection
Availability
•Conformity Monkey – architectural pattern warnings•Security Monkey – security group and S3 bucket permissionsSecurity
NetflixOSS Testing and Automation
![Page 120: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/120.jpg)
Vendor Driven PortabilityInterest in using NetflixOSS for Enterprise Private Clouds
“It’s done when it runs Asgard”Functionally completeDemonstrated March 2013Released June 2013 in V3.3
Vendor and end user interestOpenstack “Heat” getting therePaypal C3 Console based on Asgard
IBM Example application “Acme Air”Based on NetflixOSS running on AWSPorted to IBM Softlayer with Rightscale
![Page 121: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/121.jpg)
Some of the companies using NetflixOSS(There are many more, please send us your logo!)
![Page 122: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/122.jpg)
Use NetflixOSS to scale your startup or enterprise
Contribute to existing github projects and add your own
![Page 123: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/123.jpg)
Resilient API Patterns
Switch to Ben’s Slides
![Page 124: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/124.jpg)
Availability
Is it running yet?How many places is it running in?How far apart are those places?
![Page 125: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/125.jpg)
![Page 126: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/126.jpg)
Netflix Outages
• Running very fast with scissors– Mostly self inflicted – bugs, mistakes from pace of change– Some caused by AWS bugs and mistakes
• Incident Life-cycle Management by Platform Team– No runbooks, no operational changes by the SREs– Tools to identify what broke and call the right developer
• Next step is multi-region active/active– Investigating and building in stages during 2013– Could have prevented some of our 2012 outages
![Page 127: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/127.jpg)
Incidents – Impact and MitigationPRX Incidents
CSXX Incidents
Metrics impact – Feature disableXXX Incidents
No Impact – fast retry or automated failoverXXXX Incidents
Public Relations Media Impact
High Customer Service Calls
Affects AB Test Results
Y incidents mitigated by Active Active, game day practicing
YY incidents mitigated by
better tools and practices
YYY incidents mitigated by better
data tagging
![Page 128: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/128.jpg)
Real Web Server Dependencies Flow(Netflix Home page business transaction as seen by AppDynamics)
Start Here
memcached
Cassandra
Web service
S3 bucket
Personalization movie group choosers (for US, Canada and Latam)
Each icon is three to a few hundred instances across three AWS zones
![Page 129: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/129.jpg)
Three Balanced Availability ZonesTest with Chaos Gorilla
Cassandra and Evcache ReplicasZone A
Cassandra and Evcache ReplicasZone B
Cassandra and Evcache ReplicasZone C
Load Balancers
Chaos Gorilla
![Page 130: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/130.jpg)
Isolated Regions
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-East Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
EU-West Load Balancers
![Page 131: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/131.jpg)
Highly Available NoSQL Storage
A highly scalable, available and durable deployment pattern based
on Apache Cassandra
![Page 132: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/132.jpg)
Single Function Micro-Service PatternOne keyspace, replaces a single table or materialized view
Single function Cassandra Cluster Managed by PriamBetween 6 and 288 nodes
Stateless Data Access REST ServiceAstyanax Cassandra Client
OptionalDatacenterUpdate Flow
Many Different Single-Function REST Clients
Each icon represents a horizontally scaled service of three to hundreds of instances deployed over three availability zones
Over 60 Cassandra clustersOver 2000 nodesOver 300TB dataOver 1M writes/s/cluster
![Page 133: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/133.jpg)
Stateless Micro-Service Architecture
Linux Base AMI (CentOS or Ubuntu)
Optional Apache
frontend, memcached, non-java apps
Java (JDK 6 or 7)
Javamonitorin
g
Tomcat
Application war file, base servlet, platform, client interface jars, Astyanax
![Page 134: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/134.jpg)
Cassandra Instance Architecture
Linux Base AMI (CentOS or Ubuntu)
Tomcat and
Priam on JDK
Healthcheck,
Status
Java (JDK 7)
Javamonitorin
g
Cassandra Server
Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk holding Commit log and SSTables
![Page 135: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/135.jpg)
Apache Cassandra
• Scalable and Stable in large deployments– No additional license cost for large scale!– Optimized for “OLTP” vs. Hbase optimized for “DSS”
• Available during Partition (AP from CAP)– Hinted handoff repairs most transient issues– Read-repair and periodic repair keep it clean
• Quorum and Client Generated Timestamp– Read after write consistency with 2 of 3 copies– Latest version includes Paxos for stronger transactions
![Page 136: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/136.jpg)
Astyanax - Cassandra Write Data FlowsSingle Region, Multiple Availability Zone, Token Aware
Token Aware Clients
1. Client Writes to local coordinator
2. Coodinator writes to other zones
3. Nodes return ack4. Data written to
internal commit log disks (no more than 10 seconds later)
If a node goes offline, hinted handoff completes the write when the node comes back up.
Requests can choose to wait for one node, a quorum, or all nodes to ack the write
SSTable disk writes and compactions occur asynchronously
14
4
42
3
33
2
![Page 137: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/137.jpg)
Data Flows for Multi-Region WritesToken Aware, Consistency Level = Local Quorum
US Clients
1. Client writes to local replicas2. Local write acks returned to
Client which continues when 2 of 3 local nodes are committed
3. Local coordinator writes to remote coordinator.
4. When data arrives, remote coordinator node acks and copies to other remote zones
5. Remote nodes ack to local coordinator
6. Data flushed to internal commit log disks (no more than 10 seconds later)
If a node or region goes offline, hinted handoff completes the write when the node comes back up.Nightly global compare and repair jobs ensure everything stays consistent.
EU Clients
6
5
5
6 64
44
16
6
62
2
23
100+ms latency
![Page 138: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/138.jpg)
Cassandra at Scale
Benchmarking to Retire Risk
More?
![Page 139: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/139.jpg)
Scalability from 48 to 288 nodes on AWShttp://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
0 50 100 150 200 250 300 3500
200000
400000
600000
800000
1000000
1200000
174373
366828
537172
1099837
Client Writes/s by node count – Replication Factor = 3
Used 288 of m1.xlarge4 CPU, 15 GB RAM, 8 ECUCassandra 0.86Benchmark config only existed for about 1hr
2011
![Page 140: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/140.jpg)
Cassandra Disk vs. SSD BenchmarkSame Throughput, Lower Latency, Half Cost
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Load Test Driver
REST service
36x m2.xlarge EVcache
48x m2.4xlarge Cassandra
REST service
15x hi1.4xlarge Cassandra
Load Generation
Application
Memcached
Cassandra
2012
![Page 141: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/141.jpg)
2013 - Cross Region Use Cases
• Geographic Isolation– US to Europe replication of subscriber data– Read intensive, low update rate– Production use since late 2011
• Redundancy for regional failover– US East to US West replication of everything– Includes write intensive data, high update rate– Testing now
![Page 142: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/142.jpg)
Benchmarking Global CassandraWrite intensive test of cross region replication capacity
16 x hi1.4xlarge SSD nodes per zone = 96 total192 TB of SSD in six locations up and running Cassandra in 20 minutes
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-West-2 Region - Oregon
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-East-1 Region - Virginia
Test Load
Test Load
Validation Load
Inter-Zone Traffic
1 Million writesCL.ONE (wait for one replica to ack)
1 Million readsAfter 500msCL.ONE with noData loss
Inter-Region TrafficUp to 9Gbits/s, 83ms 18TB
backups from S3
![Page 143: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/143.jpg)
Copying 18TB from East to WestCassandra bootstrap 9.3 Gbit/s single threaded 48 nodes to 48 nodes
Thanks to boundary.com for these network analysis plots
![Page 144: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/144.jpg)
Inter Region Traffic TestVerified at desired capacity, no problems, 339 MB/s, 83ms latency
![Page 145: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/145.jpg)
Ramp Up Load Until It Breaks!Unmodified tuning, dropping client data at 1.93GB/s inter region trafficSpare CPU, IOPS, Network, just need some Cassandra tuning for more
![Page 146: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/146.jpg)
Failure Modes and EffectsFailure Mode Probability Current Mitigation Plan
Application Failure High Automatic degraded response
AWS Region Failure Low Active-Active multi-region deployment
AWS Zone Failure Medium Continue to run on 2 out of 3 zones
Datacenter Failure Medium Migrate more functions to cloud
Data store failure Low Restore from S3 backups
S3 failure Low Restore from remote archive
Until we got really good at mitigating high and medium probability failures, the ROI for mitigating regional failures didn’t make sense. Getting there…
![Page 147: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/147.jpg)
Cloud Security
Fine grain security rather than perimeterLeveraging AWS Scale to resist DDOS attacks
Automated attack surface monitoring and testinghttp://www.slideshare.net/jason_chan/resilience-and-security-scale-lessons-learned
![Page 148: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/148.jpg)
Security Architecture
• Instance Level Security baked into base AMI– Login: ssh only allowed via portal (not between instances)– Each app type runs as its own userid app{test|prod}
• AWS Security, Identity and Access Management– Each app has its own security group (firewall ports)– Fine grain user roles and resource ACLs
• Key Management– AWS Keys dynamically provisioned, easy updates– High grade app specific key management using HSM
![Page 149: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/149.jpg)
Cost-AwareCloud Architectures
Based on slides jointly developed withJinesh Varia
@jinmanTechnology Evangelist
![Page 150: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/150.jpg)
« Want to increase innovation? Lower the cost of failure »
Joi Ito
![Page 151: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/151.jpg)
Go Global in Minutes
![Page 152: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/152.jpg)
Netflix Examples
• European Launch using AWS Ireland– No employees in Ireland, no provisioning delay, everything worked– No need to do detailed capacity planning– Over-provisioned on day 1, shrunk to fit after a few days– Capacity grows as needed for additional country launches
• Brazilian Proxy Experiment– No employees in Brazil, no “meetings with IT”– Deployed instances into two zones in AWS Brazil– Experimented with network proxy optimization– Decided that gain wasn’t enough, shut everything down
![Page 153: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/153.jpg)
Product Launch Agility - Rightsized
Pre-Launch Build-out Testing Launch Growth Growth
DemandCloudDatacenter
$
![Page 154: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/154.jpg)
Product Launch - Under-estimated
Pre-Launch Build-out
TestingLaunch
GrowthGrowth
![Page 155: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/155.jpg)
Product Launch Agility – Over-estimated
Pre-Launch Build-out
TestingLaunch
GrowthGrowth
$
![Page 156: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/156.jpg)
Return on Agility = Grow Faster, Less Waste… Profit!
![Page 157: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/157.jpg)
#1 Business Agility by Rapid Experimentation = Profit
Key Takeaways on Cost-Aware Architectures….
![Page 158: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/158.jpg)
When you turn off your cloud resources, you actually stop paying for them
![Page 159: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/159.jpg)
1 5 9 13 17 21 25 29 33 37 41 45 49
Week
Web
Ser
vers
Optimize during a year
50% SavingsWeekly CPU Load
![Page 160: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/160.jpg)
Busi
ness
Thr
ough
put
Inst
ance
s
![Page 161: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/161.jpg)
50%+ Cost SavingScale up/down
by 70%+
Move to Load-Based Scaling
![Page 162: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/162.jpg)
Pay as you go
![Page 163: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/163.jpg)
AWS Support – Trusted Advisor – Your personal cloud assistant
![Page 164: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/164.jpg)
Other simple optimization tips
• Don’t forget to…– Disassociate unused EIPs– Delete unassociated Amazon
EBS volumes– Delete older Amazon EBS
snapshots– Leverage Amazon S3 Object
Expiration
Janitor Monkey cleans up unused resources
![Page 165: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/165.jpg)
#1 Business Agility by Rapid Experimentation = Profit
#2 Business-driven Auto Scaling Architectures = Savings
Building Cost-Aware Cloud Architectures
![Page 166: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/166.jpg)
When Comparing TCO…
![Page 167: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/167.jpg)
When Comparing TCO…
Make sure that you are including all the cost factors into consideration
PlacePowerPipesPeoplePatterns
![Page 168: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/168.jpg)
Save more when you reserve
On-demandInstances
•Pay as you go•Starts from $0.02/Hour
ReservedInstances
•One time low upfront fee + Pay as you go•$23 for 1 year term and $0.01/Hour
1-year and 3-year terms
Light Utilization RI
Medium Utilization RI
Heavy Utilization RI
![Page 169: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/169.jpg)
Utilization (Uptime)
Ideal For Savings over On-Demand
10% - 40%(>3.5 < 5.5 months/year)
Disaster Recovery(Lowest Upfront) 56%
40% - 75%(>5.5 < 7 months/year)
Standard Reserved Capacity 66%
>75%(>7 months/year)
Baseline Servers(Lowest Total Cost) 71%
Break-even point
ReservedInstances
•One time low upfront fee + Pay as you go•$23 for 1 year term and $0.01/Hour
1-year and 3-year terms
Light Utilization RI
Medium Utilization RI
Heavy Utilization RI
![Page 170: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/170.jpg)
Mix and Match Reserved Types and On-DemandIn
stan
ces
Days of Month
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Heavy Utilization Reserved Instances
Light RI Light RILight RILight RI
On-Demand
![Page 171: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/171.jpg)
Netflix Concept for Regional Failover Capacity
West Coast
Light Reservations
Heavy ReservationsNormalUse
Failover Use
![Page 172: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/172.jpg)
#1 Business Agility by Rapid Experimentation = Profit
#2 Business-driven Auto Scaling Architectures = Savings
#3 Mix and Match Reserved Instances with On-Demand = Savings
Building Cost-Aware Cloud Architectures
![Page 173: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/173.jpg)
Variety of Applications and Environments
Production Fleet
Dev FleetTest FleetStaging/QAPerf FleetDR Site
Every Application has…. Every Company has….
Business App Fleet
Marketing SiteIntranet SiteBI AppMultiple Products Analytics
![Page 174: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/174.jpg)
Consolidated Billing: Single payer for a group of accounts
• One Bill for multiple accounts
• Easy Tracking of account charges (e.g., download CSV of cost data)
• Volume Discounts can be reached faster with combined usage
• Reserved Instances are shared across accounts (including RDS Reserved DBs)
![Page 175: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/175.jpg)
Over-Reserve the Production Environment
Production Env.Account 100 Reserved
QA/Staging Env. Account 0 Reserved
Perf Testing Env.Account 0 Reserved
Development Env.Account 0 Reserved
Storage Account 0 Reserved
Total Capacity
![Page 176: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/176.jpg)
Consolidated Billing Borrows Unused Reservations
Production Env.Account 68 Used
QA/Staging Env. Account 10 Borrowed
Perf Testing Env.Account 6 Borrowed
Development Env.Account 12 Borrowed
Storage Account 4 Borrowed
Total Capacity
![Page 177: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/177.jpg)
Consolidated Billing Advantages
• Production account is guaranteed to get burst capacity– Reservation is higher than normal usage level– Requests for more capacity always work up to reserved
limit– Higher availability for handling unexpected peak demands
• No additional cost– Other lower priority accounts soak up unused reservations– Totals roll up in the monthly billing cycle
![Page 178: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/178.jpg)
#1 Business Agility by Rapid Experimentation = Profit
#2 Business-driven Auto Scaling Architectures = Savings
#3 Mix and Match Reserved Instances with On-Demand = Savings
#4 Consolidated Billing and Shared Reservations = Savings
Building Cost-Aware Cloud Architectures
![Page 179: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/179.jpg)
Continuous optimization in your architecture results in
recurring savings as early as your next month’s bill
![Page 180: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/180.jpg)
Right-size your cloud: Use only what you need
• An instance type for every purpose
• Assess your memory & CPU requirements– Fit your
application to the resource
– Fit the resource to your application
• Only use a larger instance when needed
![Page 181: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/181.jpg)
Reserved Instance Marketplace
Buy a smaller term instanceBuy instance with different OS or type
Buy a Reserved instance in different region
Sell your unused Reserved InstanceSell unwanted or over-bought capacityFurther reduce costs by optimizing
![Page 182: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/182.jpg)
Instance Type Optimization
Older m1 and m2 families• Slower CPUs• Higher response times• Smaller caches (6MB)• Oldest m1.xl 15GB/8ECU/48c• Old m2.xl 17GB/6.5ECU/41c• ~16 ECU/$/hr
Latest m3 family• Faster CPUs• Lower response times• Bigger caches (20MB)• Even faster for Java vs. ECU• New m3.xl 15GB/13 ECU/50c• 26 ECU/$/hr – 62% better!• Java measured even higher• Deploy fewer instances
![Page 183: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/183.jpg)
#1 Business Agility by Rapid Experimentation = Profit
#2 Business-driven Auto Scaling Architectures = Savings
#3 Mix and Match Reserved Instances with On-Demand = Savings
#4 Consolidated Billing and Shared Reservations = Savings
#5 Always-on Instance Type Optimization = Recurring Savings
Building Cost-Aware Cloud Architectures
![Page 184: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/184.jpg)
Follow the Customer (Run web servers) during the day
Follow the Money (Run Hadoop clusters) at night
0
2
4
6
8
10
12
14
16
Mon Tue Wed Thur Fri Sat Sun
No
of In
stan
ces
Runn
ing
Week
Auto Scaling Servers
Hadoop Servers
No. of ReservedInstances
![Page 185: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/185.jpg)
Soaking up unused reservations
Unused reserved instances is published as a metric
Netflix Data Science ETL Workload• Daily business metrics roll-up• Starts after midnight• EMR clusters started using hundreds of instances
Netflix Movie Encoding Workload• Long queue of high and low priority encoding jobs• Can soak up 1000’s of additional unused instances
![Page 186: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/186.jpg)
#1 Business Agility by Rapid Experimentation = Profit
#2 Business-driven Auto Scaling Architectures = Savings
#3 Mix and Match Reserved Instances with On-Demand = Savings
#4 Consolidated Billing and Shared Reservations = Savings
#5 Always-on Instance Type Optimization = Recurring Savings
Building Cost-Aware Cloud Architectures
#6 Follow the Customer (Run web servers) during the day Follow the Money (Run Hadoop clusters) at night
![Page 187: Yow Conference Dec 2013 Netflix Workshop Slides with Notes](https://reader033.fdocuments.net/reader033/viewer/2022052321/541733ba7bef0a3f248b58d8/html5/thumbnails/187.jpg)
Takeaways
Cloud Native Manages Scale and Complexity at Speed
NetflixOSS makes it easier for everyone to become Cloud Native
Rethink deployments and turn things off to save money!
http://netflix.github.comhttp://techblog.netflix.comhttp://slideshare.net/Netflix
http://www.linkedin.com/in/adriancockcroft
@adrianco @NetflixOSS @benjchristensen