How Netflix does Microservices

25
How Netflix does Microservices ... Manuel Correa

Transcript of How Netflix does Microservices

How Netflix does Microservices ...

Manuel Correa

Microservices“Small Autonomous Services

that Work Together”Sam Newman

Microservices“Conway’s Law”

“Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.”

Microservices Principles

http://www.slideshare.net/spnewman/principles-of-microservices-ndc-2014

Modeled around Business Domain Culture of Automation Hide Implementation

Decentralize All Things

Design for Failure

Highly Observable

Deploy Independently

MicroServices

Culture of Automation- Immutable infrastructure in

AWS

Decentralize All ThingsHide Implementation Details

- Routing- Contracts- Resiliancy- Discovery- How services work togetherNodeJS

Ruby

Clojure

Free for all

Agree

Decentralize All ThingsSmart Endpoints and Dumb Pipes

- Dynamic Routing - Gateway for all Netflix services- Pluggable system that takes care of:

- Authorization and Authentication- Monitoring and tracking request- Load shedding- First level of resilience- Enables caching in the gateway level

Decentralize All ThingsSmart Endpoints and Dumb Pipes

Decentralize All thingsService Discovery

- Service Registry- Middle tier load balancing- Carries metadata of each service- Dynamic Service repository

Decentralize All ThingsDynamic Configuration

- Dynamic Typed Properties = Feature Flag System- Allow you to change properties on Runtime - Polling framework- Multiple sources (i.e.: Cassandra and DynamoDB)- Callbacks when the property changes

CB’s Zuul is using Archaius to change properties across AWS regions, HttpClient configurations and logging level

Design for Failure

- HTTP library- Load balancing on the client side- Retrys built-in - Caching- Request batching

Design for Failure

- Java Resilience library- Stop cascading failures- Fallback and gracefully degrade when possible- Realtime monitoring- Circuit breaker pattern

Design for Failure

● No Service has 100% SLA● 99.9930= 99.7% uptime● 0.3% of 1 billion requests = 300,000 failures● 2+ hours downtime/month even if all

dependencies have excellent uptime.

Service1

Service2

Service3 Fallback

Design for FailureCircuit Breaker pattern

Design for FailureHystrix Dashboard

Decentralized Architecture

DemoMay the demo Gods be with us...

/service/jobs

Client

/service/resumes

:9292

:9292

DemoMay the demo Gods be with us...

SERVICEClient Zu

ul

Hys

trix

Rib

bon

Fallback Backup Service

FallbackCache

:9090

:9292

:9393

Design for Failure

- Testing resiliency in Production- Chaos Monkey => Kill instances randomly- Latency Monkey => Induce latency in services- Chaos Gorilla => Simulates AZ and regions down- Conformity Monkey => Make sure instances follow good

practices

Highly Observable

- Hystrix Stream aggregator

- AWS Change Tracker

- AWS Usage Tracker

Take Aways

http://www.slideshare.net/spnewman/principles-of-microservices-ndc-2014

Modeled around Business Domain Culture of Automation Hide Implementation

Decentralize All Things

Design for Failure

Highly Observable

Deploy Independently

MicroServices

Take Aways- Each Service must have a fallback strategy by

design- Routing layer is essential for the architecture- To make Services work together, there is a

need for a highly reliable infrastructure around the MicroServices

Take Aways“Conway’s Law”

“Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.”