MicroServices at Netflix - challenges of scale

31
MicroServices at NETFLIX Best Practices & Tools of the trade Sudhir Tonse Manager, Cloud Platform @stonse http://linkedin.com/in/ sudhirtonse Nitesh Kant Platform Architect @NiteshKant http://linkedin.com/in/ niteshkant

description

MicroServices has caught on as the design pattern of choice for many companies at scale. While MicroServices and SOA in general have many positives compared to Monolithic apps, it does come with its own challenges - especially when running at scale. These slides were for a 15 min Meetup talk hosted at Cisco

Transcript of MicroServices at Netflix - challenges of scale

  • 1.MicroServices at NETFLIX Best Practices & Tools of the trade Sudhir Tonse Manager, Cloud Platform @stonse http://linkedin.com/in/sudhirtonse Nitesh Kant Platform Architect @NiteshKant http://linkedin.com/in/niteshkant

2. Old DataCenter (2008) Everything in one WebApp (.war) AWS Cloud (2012) 100s of Fine Grained Services 3. Positives Isolation brings better Availability* Independent Speed of Delivery (by different teams) Decentralized Governance (DevOps) 4. Challenges Distributed Systems are inherently Complex Operational Overhead (100s of services; DevOps model absolutely required) Service Interface Versioning, Mismatches? Testing (Need the entire ecosystem to test) Fan out of Requests -> Increases n/w traffic 5. Claim MicroServices increase your overall availability 6. True? Yes but wait! 7. One missing ; brought down ALL of Netflix 8. Introduced MicroServices ... 9. Uptime SLA Assume a Monolithic Service with 99.99% availability What if you have ... ~30 Microservices (each with 99.99% SLA)?? 10. Reality One rogue (dependency) micro service CAN bring your whole site down! 11. How? 12. Service Hosed!! 13. Combined Effective SLA (Availability) == 2 HOURS of downtime per month == 99.7 % uptime!! 14. But what if I want better? MicroServices does not automatically mean better Availability - Unless you have Fault Tolerant Architecture 15. Guard your Service! Use Hystrix (http://github.com/netflix/hystrix) 16. Service Discovery & Loadbalancers Choice 1. Central Loadbalancer? (H/W or S/W) OR 2. Client based S/W Loadbalancer? 17. Client based Smart Loadbalancer Use Ribbon (http://github.com/netflix/ribbon) 18. Tools of the Trade OR 19. Service Dependency View 20. Distributed Tracing 21. Chattiness (and Fan Out) ~2 Billion Requests per day on Edge Service Results in ~20 Billion Fan out requests in ~100 MicroServices 22. Fan out 23. IPC 2.0 .. the next frontier @NiteshKant 24. Netflix IPC Stack (1.0) A p a c h e H T T P C l i e n t Eureka (Service Registry) Server (Karyon) Apache Tomcat Client H y s t r i x E V C a c h e Ribbon Load Balancing Eureka Integration Metrics (Servo) Bootstrapping (Governator) Metrics (Servo) Admin ConsoleHTTP Eureka Integration Registration Fetch Registry 25. Netflix IPC Stack (2.0) Client (Ribbon 2.0) Eureka (Service Registry) Server (Karyon) Ribbon Transport Load Balancing Eureka Integration Metrics (Servo) Bootstrapping (Governator) Metrics (Servo) Admin Console HTTP Eureka Integration Registration Fetch Registry Ribbon Hystrix EVCache R x N e t t y RxNetty UDP TCP WebSockets SSE 26. Synchronous Applications Tomcat Connector Application code Hystrix Apache HTTP Client Conn 1Thread 1 Thread 1 Thread 1* Thread 1 Conn 2Thread 2 Thread 2 Thread 2* Thread 2 Conn nThread n Thread n Thread n* Thread n .... *If there isnt any application driven thread change 27. Synchronous Applications Tomcat Connector Application code Hystrix Apache HTTP Client Conn 1Thread 1 Thread 1 Thread 1* Thread 1 Conn 2Thread 2 Thread 2 Thread 2* Thread 2 Conn nThread n Thread n Thread n* Thread n .... Large # of connections / Large # of external dependencies => tons of threads. *If there isnt any application driven thread change 28. Asynchronous applications Application code RxNettyHystrixRxNetty Eventloop 1 Eventloop 4 Eventloop 1* Eventloop 4* *If there isnt any application driven thread change N connections per eventloop Request processing in Eventloop Hystrix used for throttling not for achieving asynchronicity. Eventloops are shared between In & OUT 29. Asynchronous Applications Application code RxNettyHystrixRxNetty Eventloop 1 Eventloop 4 Eventloop 1* Eventloop 4* *If there isnt any application driven thread change Eventloop 2 Eventloop 3 Eventloop 1* Eventloop 4* .... Eventloop 4 Eventloop 1 Eventloop 1* Eventloop 4* # of processors => # of eventloops. No dependence on # of connections 30. Takeaway MicroServices is a better architecture compared to Monolithic Apps However Beaware of the challenges - Use Best Practices and battle-tested OSS components 31. http://netflix.github.co