Netflix Global Cloud Architecture
-
Upload
adrian-cockcroft -
Category
Technology
-
view
22.143 -
download
9
description
Transcript of Netflix Global Cloud Architecture
![Page 1: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/1.jpg)
Globally Distributed Cloud Applica4ons at Ne7lix
October 2012 Adrian Cockcro3 @adrianco #ne6lixcloud
h;p://www.linkedin.com/in/adriancockcro3
![Page 2: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/2.jpg)
Adrian Cockcro3 • Director, Architecture for Cloud Systems, Ne6lix Inc.
– Previously Director for PersonalizaMon Pla6orm
• DisMnguished Availability Engineer, eBay Inc. 2004-‐7 – Founding member of eBay Research Labs
• DisMnguished Engineer, Sun Microsystems Inc. 1988-‐2004 – 2003-‐4 Chief Architect High Performance Technical CompuMng – 2001 Author: Capacity Planning for Web Services – 1999 Author: Resource Management – 1995 & 1998 Author: Sun Performance and Tuning – 1996 Japanese EdiMon of Sun Performance and Tuning
• SPARC & Solarisパフォーマンスチューニング (サンソフトプレスシリーズ)
• More – Twi;er @adrianco – Blog h;p://perfcap.blogspot.com – PresentaMons at h;p://www.slideshare.net/adrianco
![Page 3: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/3.jpg)
The Ne6lix Streaming Service
Now in USA, Canada, LaMn America, UK, Ireland, Sweden, Denmark,
Norway and Finland
![Page 4: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/4.jpg)
US Non-‐Member Web Site AdverMsing and MarkeMng Driven
![Page 5: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/5.jpg)
Member Web Site PersonalizaMon Driven
![Page 6: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/6.jpg)
Streaming Device API
Netflix Ready DevicesFrom: May 2008
To: May 2010
![Page 7: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/7.jpg)
Content Delivery Service Distributed storage nodes controlled by Ne6lix cloud services
![Page 8: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/8.jpg)
Abstract
• Ne6lix on Cloud – What, Why and When
• Globally Distributed Architecture
• Open Source Components
![Page 9: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/9.jpg)
Why Use Cloud?
![Page 10: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/10.jpg)
Things we don’t do
![Page 11: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/11.jpg)
What Ne6lix Did
• Moved to SaaS – Corporate IT – OneLogin, Workday, Box, Evernote… – Tools – Pagerduty, AppDynamics, EMR (Hadoop)
• Built our own PaaS – Customized to make our developers producMve – Large scale, global, highly available, leveraging AWS
• Moved incremental capacity to IaaS – No new datacenter space since 2008 as we grew – Moved our streaming apps to the cloud
![Page 12: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/12.jpg)
Keeping up with Developer Trends
• Big Data/Hadoop • AWS Cloud • ApplicaMon Performance Management • Integrated DevOps PracMces • ConMnuous IntegraMon/Delivery • NoSQL • Pla6orm as a Service; Fine grain SOA • Social coding, open development/github
In producMon at Ne6lix
2009 2009 2010 2010 2010 2010 2010 2011
![Page 13: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/13.jpg)
AWS specific feature dependence….
![Page 14: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/14.jpg)
Portability vs. FuncMonality
• Portability – the OperaMons focus – Avoid vendor lock-‐in – Support datacenter based use cases – Possible operaMons cost savings
• FuncMonality – the Developer focus – Less complex test and debug, one mature supplier – Faster Mme to market for your products – Possible developer Mme/cost savings
![Page 15: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/15.jpg)
FuncMonal PaaS
• IaaS base -‐ all the features of AWS – Very large scale, mature, global, evolving rapidly – ELB, Autoscale, VPC, SQS, EIP, EMR, etc, etc. – E.g. Large files (TB) and mulMpart writes in S3
• FuncMonal PaaS – Ne6lix added features – ConMnuous build/deploy, SOA, HA pa;erns – Asgard console, Monkeys, Big data tools – Cassandra/Zookeeper data store automaMon
![Page 16: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/16.jpg)
How Ne6lix Works
Customer Device (PC, PS3, TV…)
Web Site or Discovery API
User Data
PersonalizaMon
Streaming API
DRM
QoS Logging
OpenConnect CDN Boxes
CDN Management and
Steering
Content Encoding
Consumer Electronics
AWS Cloud Services
CDN Edge LocaMons
![Page 17: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/17.jpg)
Component Services (Simplified view using AppDynamics)
![Page 18: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/18.jpg)
Web Server Dependencies Flow (Home page business transacMon as seen by AppDynamics)
Start Here
memcached
Cassandra
Web service
S3 bucket
![Page 19: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/19.jpg)
One Request Snapshot (captured because it was unusually slow)
![Page 20: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/20.jpg)
Current Architectural Pa;erns for Availability
• Isolated Services – Resilient Business logic
• Three Balanced Availability Zones – Resilient to Infrastructure outage
• Triple Replicated Persistence – Durable distributed Storage
• Isolated Regions – US and EU don’t take each other down
![Page 21: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/21.jpg)
Isolated Services Test With Chaos Monkey, Latency Monkey
![Page 22: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/22.jpg)
Three Balanced Availability Zones Test with Chaos Gorilla
Cassandra and Evcache Replicas
Zone A
Cassandra and Evcache Replicas
Zone B
Cassandra and Evcache Replicas
Zone C
Load Balancers
![Page 23: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/23.jpg)
Triple Replicated Persistence Cassandra maintenance affects individual replicas
Cassandra and Evcache Replicas
Zone A
Cassandra and Evcache Replicas
Zone B
Cassandra and Evcache Replicas
Zone C
Load Balancers
![Page 24: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/24.jpg)
Isolated Regions
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
US-‐East Load Balancers
Cassandra Replicas
Zone A
Cassandra Replicas
Zone B
Cassandra Replicas
Zone C
EU-‐West Load Balancers
![Page 25: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/25.jpg)
Failure Mode Probability Mi4ga4on Plan
ApplicaMon Failure High AutomaMc degraded response
AWS Region Failure Low Wait for region to recover
AWS Zone Failure Medium ConMnue to run on 2 out of 3 zones
Datacenter Failure Medium Migrate more funcMons to cloud
Data store failure Low Restore from S3 backups
S3 failure Low Restore from remote archive
Failure Modes and Effects
![Page 26: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/26.jpg)
Ne6lix Deployed on AWS
Content
Content Management
EC2 Encoding
S3 Petabytes
Logs
S3 Terabytes
EMR
Hive & Pig
Business Intelligence
Play
DRM
CDN rouMng
Bookmarks
Logging
WWW
Sign-‐Up
Search Solr
Movie Choosing
RaMngs
API
Metadata
Device Config
TV Movie Choosing
Social Facebook
CS
InternaMonal CS lookup
DiagnosMcs & AcMons
Customer Call Log
CS AnalyMcs
2009 2009 2010 2010 2010 2011
CDNs ISPs
Terabits Customers
![Page 27: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/27.jpg)
Cloud Architecture Pa;erns
Where do we start?
![Page 28: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/28.jpg)
Datacenter to Cloud TransiMon Goals
• Faster – Lower latency than the equivalent datacenter web pages and API calls – Measured as mean and 99th percenMle – For both first hit (e.g. home page) and in-‐session hits for the same user
• Scalable – Avoid needing any more datacenter capacity as subscriber count increases – No central verMcally scaled databases – Leverage AWS elasMc capacity effecMvely
• Available – SubstanMally higher robustness and availability than datacenter services – Leverage mulMple AWS availability zones – No scheduled down Mme, no central database schema to change
• ProducMve – OpMmize agility of a large development team with automaMon and tools – Leave behind complex tangled datacenter code base (~8 year old architecture) – Enforce clean layered interfaces and re-‐usable components
![Page 29: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/29.jpg)
Ne6lix Datacenter vs. Cloud Arch
Central SQL Database Distributed Key/Value NoSQL
SMcky In-‐Memory Session Shared Memcached Session
Cha;y Protocols Latency Tolerant Protocols
Tangled Service Interfaces Layered Service Interfaces
Instrumented Code Instrumented Service Pa;erns
Fat Complex Objects Lightweight Serializable Objects
Components as Jar Files Components as Services
![Page 30: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/30.jpg)
Cassandra on AWS
A highly available and durable deployment pa;ern
![Page 31: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/31.jpg)
Cassandra Service Pa;ern Cassandra Cluster Managed by Priam Between 6 and 72 nodes
Data Access REST Service Astyanax Cassandra Client
Datacenter Update Flow
Service REST Clients
Appdynamics Service Flow VisualizaMon
![Page 32: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/32.jpg)
ProducMon Deployment Totally Denormalized Data Model
Over 50 Cassandra Clusters Over 500 nodes Over 30TB of daily backups Biggest cluster 72 nodes 1 cluster over 250Kwrites/s
![Page 33: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/33.jpg)
Astyanax -‐ Cassandra Write Data Flows Single Region, MulMple Availability Zone, Token Aware
Token Aware Clients
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
1. Client Writes to local coordinator
2. Coodinator writes to other zones
3. Nodes return ack 4. Data wri;en to
internal commit log disks (no more than 10 seconds later)
If a node goes offline, hinted handoff completes the write when the node comes back up. Requests can choose to wait for one node, a quorum, or all nodes to ack the write SSTable disk writes and compacMons occur asynchronously
14
4
42
3
3 3
2
![Page 34: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/34.jpg)
Data Flows for MulM-‐Region Writes Token Aware, Consistency Level = Local Quorum
1. Client writes to local replicas 2. Local write acks returned to
Client which conMnues when 2 of 3 local nodes are commi;ed
3. Local coordinator writes to remote coordinator.
4. When data arrives, remote coordinator node acks and copies to other remote zones
5. Remote nodes ack to local coordinator
6. Data flushed to internal commit log disks (no more than 10 seconds later)
If a node or region goes offline, hinted handoff completes the write when the node comes back up. Nightly global compare and repair jobs ensure everything stays consistent.
US Clients
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
EU Clients
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
Cassandra • Disks • Zone A
Cassandra • Disks • Zone B
Cassandra • Disks • Zone C
6
5
5
6 6 4
4 4
1 6
6
6 2
2
2 3
100+ms latency
![Page 35: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/35.jpg)
ETL for Cassandra
• Data is de-‐normalized over many clusters! • Too many to restore from backups for ETL • SoluMon – read backup files using Hadoop • Aegisthus
– h;p://techblog.ne6lix.com/2012/02/aegisthus-‐bulk-‐data-‐pipeline-‐out-‐of.html
– High throughput raw SSTable processing – Re-‐normalizes many clusters to a consistent view – Extract, Transform, then Load into Teradata
![Page 36: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/36.jpg)
Benchmarks and Scalability
![Page 37: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/37.jpg)
Cloud Deployment Scalability New Autoscaled AMI – zero to 500 instances from 21:38:52 -‐ 21:46:32, 7m40s
Scaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB)
Min. 1st Qu. Median Mean 3rd Qu. Max. !41.0 104.2 149.0 171.8 215.8 562.0!
![Page 38: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/38.jpg)
Scalability from 48 to 288 nodes on AWS h;p://techblog.ne6lix.com/2011/11/benchmarking-‐cassandra-‐scalability-‐on.html
174373
366828
537172
1099837
0
200000
400000
600000
800000
1000000
1200000
0 50 100 150 200 250 300 350
Client Writes/s by node count – Replica4on Factor = 3
Used 288 of m1.xlarge 4 CPU, 15 GB RAM, 8 ECU Cassandra 0.86 Benchmark config only existed for about 1hr
![Page 39: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/39.jpg)
Cassandra on AWS
The Past • Instance: m2.4xlarge • Storage: 2 drives, 1.7TB • CPU: 8 Cores, 26 ECU • RAM: 68GB • Network: 1Gbit • IOPS: ~500 • Throughput: ~100Mbyte/s • Cost: $1.80/hr
The Future • Instance: hi1.4xlarge • Storage: 2 SSD volumes, 2TB • CPU: 8 HT cores, 35 ECU • RAM: 64GB • Network: 10Gbit • IOPS: ~100,000 • Throughput: ~1Gbyte/s • Cost: $3.10/hr
![Page 40: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/40.jpg)
Cassandra Disk vs. SSD Benchmark Same Throughput, Lower Latency, Half Cost
![Page 41: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/41.jpg)
Availability and Resilience
![Page 42: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/42.jpg)
Chaos Monkey h;p://techblog.ne6lix.com/2012/07/chaos-‐monkey-‐released-‐into-‐wild.html • Computers (Datacenter or AWS) randomly die
– Fact of life, but too infrequent to test resiliency • Test to make sure systems are resilient
– Allow any instance to fail without customer impact
• Chaos Monkey hours – Monday-‐Friday 9am-‐3pm random instance kill
• ApplicaMon configuraMon opMon – Apps now have to opt-‐out from Chaos Monkey
![Page 43: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/43.jpg)
Responsibility and Experience
• Make developers responsible for failures – Then they learn and write code that doesn’t fail
• Use Incident Reviews to find gaps to fix – Make sure its not about finding “who to blame”
• Keep Mmeouts short, fail fast – Don’t let cascading Mmeouts stack up
• Make configuraMon opMons dynamic – You don’t want to push code to tweak an opMon
![Page 44: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/44.jpg)
Resilient Design – Circuit Breakers h;p://techblog.ne6lix.com/2012/02/fault-‐tolerance-‐in-‐high-‐volume.html
![Page 45: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/45.jpg)
Distributed OperaMonal Model
• Developers – Provision and run their own code in producMon – Take turns to be on call if it breaks (pagerduty) – Configure autoscalers to handle capacity needs
• DevOps and PaaS (aka NoOps) – DevOps is used to build and run the PaaS – PaaS constrains Dev to use automaMon instead – PaaS puts more responsibility on Dev, with tools
![Page 46: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/46.jpg)
Culture
![Page 47: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/47.jpg)
UnconvenMonal Culture See culture deck at h;p://jobs.ne6lix.com
• Brave/Aggressive from the top down • Focus on talent density above everything • Reduce process, remove complexity • Freedom and Responsibility • One product focus for the whole company • (almost) full informaMon sharing across co. • Simplified managers role
![Page 48: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/48.jpg)
Managers Role
• Hiring, Architecture, Project Management • No vacaMon policy to track • (Almost) no remote employees or contractors • No bonuses to allocate • No expenses to approve • Pay mark to market handled at VP level
![Page 49: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/49.jpg)
Ne6lix OrganizaMon DevOps Org ReporMng into Product Group, not ITops
CEO – Reed HasMngs
CPO – Chief Product Officer – Neil Hunt
VP -‐ Cloud and Pla6orm Engineering -‐ Yury
Architecture
Future planning Security Arch Efficiency
AWS VPC Hyperguard
Powerpoint J
Pla6orm and Persistence Engineering
Base Pla6orm Zookeeper
Cassandra Ops
AWS Instances
Cloud SoluMons
Monitoring Monkeys Build Tools
AWS Instances AWS API
Cloud Ops Reliability Engineering
Alert RouMng Incident Lifecycle
PagerDuty
PersonalizaMon Pla6orm and
Performance Eng
Metadata Benchmarking Memcached
AWS Instances
Membership and Billing
Data sources Vault processing
Cassandra
Data Science Pla6orm
Business Intelligence
Hadoop on EMR
![Page 50: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/50.jpg)
Build Your Own PaaS
![Page 51: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/51.jpg)
Components
• ConMnuous build framework turns code into AMIs • AWS accounts for test, producMon, etc. • Cloud access gateway • Service registry • ConfiguraMon properMes service • Persistence services • Monitoring, alert forwarding • Backups, archives
![Page 52: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/52.jpg)
Ne6lix Open Source Strategy
• Release PaaS Components git-‐by-‐git – Source at github.com/ne6lix – we build from it… – Intros and techniques at techblog.ne6lix.com – Blog post or new code every few weeks
• MoMvaMons – Give back to Apache licensed OSS community – MoMvate, retain, hire top engineers – “Peer pressure” code cleanup, external contribuMons
![Page 53: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/53.jpg)
Instance creaMon
ASG / Instance started Instance Running
Asgard
Autoscaling scripts Odin
Bakery & Build tools
Base AMI
ApplicaMon Code
Instance
Image baked
![Page 54: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/54.jpg)
ApplicaMon Launch
Registering, configuraMon
Eureka
Entrypoints Archaius
Governator (Guice)
Async logging
Servo
ApplicaMon iniMalizing
![Page 55: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/55.jpg)
RunMme
Managing service
Resiliency aids
Priam
Exhibitor
Explorers
NIWS LB
Astyanax
Curator
Dependency Command
REST client
Chaos Monkey Latency Monkey Janitor Monkey Cass JMeter
Calling other services
![Page 56: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/56.jpg)
Open Source Projects Github / Techblog
Apache ContribuMons
Techblog Post
Coming Soon
Priam Cassandra as a Service
Astyanax Cassandra client for Java
CassJMeter Cassandra test suite
Cassandra MulM-‐region EC2 datastore support
Aegisthus Hadoop ETL for Cassandra
Explorers
Governator -‐ Library lifecycle and dependency injecMon
Odin Workflow orchestraMon
Async logging
Exhibitor Zookeeper as a Service
Curator Zookeeper Pa;erns
EVCache Memcached as a Service
Eureka / Discovery Service Directory
Archaius Dynamics ProperMes Service
EntryPoints
Server-‐side latency/error injecMon
REST Client + mid-‐Mer LB
ConfiguraMon REST endpoints
Servo and Autoscaling Scripts
Honu Log4j streaming to Hadoop
Circuit Breaker Robust service pa;ern
Asgard -‐ AutoScaleGroup based AWS console
Chaos Monkey Robustness verificaMon
Latency Monkey
Janitor Monkey
Bakeries and AMI
Build dynaslaves
Legend
![Page 57: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/57.jpg)
Roadmap for 2012
• More resiliency and improved availability • More automaMon, orchestraMon • “Hardening” the pla6orm, code clean-‐up • Lower latency for web services and devices • IPv6 – now running in prod, rollout in process • More open sourced components • See you at AWS Re:Invent in November…
![Page 58: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/58.jpg)
Takeaway
Ne?lix has built and deployed a scalable global Pla?orm as a Service.
Key components of the Ne?lix PaaS are being released as Open Source projects so you can build your own custom PaaS.
h;p://github.com/Ne6lix h;p://techblog.ne6lix.com h;p://slideshare.net/Ne6lix
h;p://www.linkedin.com/in/adriancockcro3
@adrianco #ne6lixcloud
![Page 59: Netflix Global Cloud Architecture](https://reader030.fdocuments.net/reader030/viewer/2022012322/54b76a104a795971038b4625/html5/thumbnails/59.jpg)
Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features
• AWS – Amazon Web Services (common name for Amazon cloud) • AMI – Amazon Machine Image (archived boot disk, Linux, Windows etc. plus applicaMon code) • EC2 – ElasMc Compute Cloud
– Range of virtual machine types m1, m2, c1, cc, cg. Varying memory, CPU and disk configuraMons. – Instance – a running computer system. Ephemeral, when it is de-‐allocated nothing is kept. – Reserved Instances – pre-‐paid to reduce cost for long term usage – Availability Zone – datacenter with own power and cooling hosMng cloud instances – Region – group of Avail Zones – US-‐East, US-‐West, EU-‐Eire, Asia-‐Singapore, Asia-‐Japan, SA-‐Brazil, US-‐Gov
• ASG – Auto Scaling Group (instances booMng from the same AMI) • S3 – Simple Storage Service (h;p access) • EBS – ElasMc Block Storage (network disk filesystem can be mounted on an instance) • RDS – RelaMonal Database Service (managed MySQL master and slaves) • DynamoDB/SDB – Simple Data Base (hosted h;p based NoSQL datastore, DynamoDB replaces SDB) • SQS – Simple Queue Service (h;p based message queue) • SNS – Simple NoMficaMon Service (h;p and email based topics and messages) • EMR – ElasMc Map Reduce (automaMcally managed Hadoop cluster) • ELB – ElasMc Load Balancer • EIP – ElasMc IP (stable IP address mapping assigned to instance or ELB) • VPC – Virtual Private Cloud (single tenant, more flexible network and security constructs) • DirectConnect – secure pipe from AWS VPC to external datacenter • IAM – IdenMty and Access Management (fine grain role based security keys)