Microservices reativos usando a stack do Netflix na AWS · 2016-08-26 · Microservices reativos...

58
Microservices reativos usando a stack do Netflix na AWS Diego Pacheco Principal Software Architect at ilegra.com @diego_pacheco

Transcript of Microservices reativos usando a stack do Netflix na AWS · 2016-08-26 · Microservices reativos...

Microservices reativos usando a stack do Netflix na AWS

Diego PachecoPrincipal Software Architect at ilegra.com@diego_pacheco

www.ilegra.com

NetflixOSS Stack

Why Netflix?

Billions Requests Per Day 1/3 US internet

bandwidth ~10k EC2 Instances Multi-Region 100s Microservices Innovation + Solid

Service SOA, Microservices and

DevOps Benchmark

Social Product Social Network Video Docs Apps Chat

ScalabilityDistributed Teams Could reach some

Web Scale

Netflix My Problem

AWS

Cloud Native

Principles

Stateless Services Ephemeral Instances Everything fails all the

time Auto Scaling / Down

Scaling Multi AZ and multi

Region No SPOF Design for Failure

(expected)

SOA Microservices No Central Database NoSQL Lightweight Serializable

Objects Latency tolerant

protocols DevOps Enabler

Immutable Infrastructure Anti-Fragility

Right Set of Assumptons

Microservices

Reactive

Java Drivers X REST

X

Simple View of the Architecture

Zuul

UI

Microservice

Cassandra Cluster

Stack

OSS

Zuul

Zuul

Karyon: Microbiology - Nucleus

Reactive Extensions + Netty Server Lower Latency under Heavy Load Fewer Locks, Fewer Thread Migrations Consumes Less CPU Lower Object Allocation Rate

RxNetty

Karyon: CODE

Karyon: Reactive

Karyon: Reactive

Eureka and Service Discovery

http://microservices.io/patterns/server-side-discovery.html

Eureka

AWS Service Registry for Mid-tier Load balancing and Failover REST based Karyon and Ribbon Integration

Eureka

Eureka and Service Discovery

Availability

Histryx

IPC Library Client Side Load Balancing Multi-Protocol (HTTP, TCP, UDP) Caching* Batching Reactive

Ribbon

RibbonCODE

RibbonCODE

Reactive Extension of the JVM Async/Event based programming Observer Pattern Less 1mb Heavy usage by Netflix OSS Stack

RX-Java

Archaius

Configuration Management Solution Dynamic and Typed Properties High Throughtput and Thread Safety Callbacks: Notifications of config changes JMX Beans Dynamic Config Sources: File, Db, DynamoDB, Zookeper Based on Apache Commons Configuration

Archaius + Git

MicroserviceMicroservice Slave Side Car

CentralInternal GIT Property

Files

File System

MicroserviceMicroservice Slave Side Car

File System

MicroserviceMicroservice Slave Side Car

File System

Asgard

Asgard

Packer

JOB Create Bake/Provision

Launch

Deploys

Dynomite: Distributed Cache

https://github.com/Netflix/dynomite

Dynomite

Implements the Amazon DynamoSimilar to Cassandra, Riak and DynamoDB

Strong Consistency – Quorum-like – No Data LossPluggable ScalableRedis / MemcachedMulti-Clients with DynoCan use most of redis commandsIntegrated with Eureka via Prana

Isolate Failure – Avoid cascading Redundancy – NO SPOF Auto-Scaling Fault Tolerance and Isolation Recovery Fallbacks and Degraded Experience Protect Customer from failures – Don’t throw Failures ->

Failures VS Errors

Dynomite: Distributed Cache

Dynomite: Internals

Oregon D1

Oregon D2

N California D3

Eureka Server

Eureka Server

Prana

Prana

Prana

Multi-Region Cluster

Dynomite: CODE

Dynomite Contributions

https://github.com/Netflix/dynomite

https://github.com/Netflix/dynomite/pull/207

https://github.com/Netflix/dynomite/pull/200

Caos Engineering

Gatling

Stress Testing ToolScala DSLRun on top of AkkaSimple to use

Chaos Arch

Zuul

Microservice N1 Microservice N2

Cassandra Cluster

Zuul

EurekaELB

Running…

Chaos Results and Learnings

Retry configuration and Timeouts in Ribbon Right Class in Zuul 1.x (default retry only SocketException)

RequestSpecificRetryHandler (Httpclient Exceptions) zuul.client.ribbon.MaxAutoRetries=1 zuul.client.ribbon.MaxAutoRetriesNextServer=1 zuul.client.ribbon.OkToRetryOnAllOperations=true

Eureka Timeouts It Works Everything needs to have redudancy ASG is your friend :-) Stateless Service FTW

Microservice Producer

Kafka / Storm :: Event System

Chaos Results and Learnings

Before: Data was not in Elastic Search Producers was loosing data

After: No Data Loss It Works

Changes: No logging on Microservice :( (Log was added) Code that publish events on a try-catch Retry config in kafka producer from 0 to 5

Main Challenges

Hacker Mindset

Next Steps

IPC Spinnaker Containers Client side Aggregation DevOps 2.0 -> Remediation / Skynet

Pocs

https://github.com/diegopacheco/netflixoss-pocs

http://diego-pacheco.blogspot.com.br/search/label/netflix?max-results=30

Microservices reativos usando a stack do Netflix na AWS

Diego PachecoPrincipal Software Architect at ilegra.com@diego_pacheco

Obrigado!