Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the...

49
Service Mesh Technology Deep dive and reasons for adoption Diógenes Rettori - @rettori Executive Director - Cloud Architecture JPMorgan Chase & Co.

Transcript of Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the...

Page 1: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

Service MeshTechnology Deep dive and reasons for adoption

Diógenes Rettori - @rettoriExecutive Director - Cloud ArchitectureJPMorgan Chase & Co.

Page 2: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

One Message.

Page 3: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

One Message.

Page 4: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Agenda

Quick Introduction to Service Mesh 5m

Service Mesh x Distributed Systems 5m

Technology Options 2m

Istio and Linkerd Deep Dive 20m

How to chose 5m

Recommended tools 3

Page 5: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Service MeshIstio & Linkerd

Diogenes Rettori & Tiago Vieira

Currently Writing

Page 6: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

Page 7: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

?

Page 8: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

AZ-1 AZ-2

Quick Intro to Service Mesh

booking

payments

catalog

notifications

Page 9: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notificationspayments

catalog

notificationspayments

catalog

Page 10: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notificationspayments

catalog

notificationspayments

catalog

Page 11: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

catalog

notifications

catalog

1000/ day50 /second

Page 12: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

catalog

notifications

catalog

1000/ day50 /second

Page 13: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

Page 14: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

Page 15: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

Page 16: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

Page 17: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Quick Intro to Service Mesh

booking

payments

catalog

notifications

Page 18: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

A Service Mesh is an intelligent communications network that understands the relationships

between microservices and interferes in the traffic to increase the reliability and security of the

whole system.

Page 19: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

A Service Mesh is an intelligent communications network that understands the relationships

between microservices and interferes in the traffic to increase the reliability and security of the

whole system.

Addresses needs of distributed systems.

Page 20: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Service Mesh x Distributed Systems

The network is reliable.

Latency is zero.

Bandwidth is infinite.

The network is secure.

Topology doesn’t change

There is one administrator.

Transport cost is zero.

The network is homogeneous.

Fallacies of Distributed Systems

Given that they are fallacies, we should

assume the opposite.

Page 21: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Service Mesh x Distributed Systems

The network is reliable.

Latency is zero.

Bandwidth is infinite.

The network is secure.

Topology doesn’t change

There is one administrator.

Transport cost is zero.

The network is homogeneous.

Fallacies of Distributed Systems

Circuit breaking and load balancing

Timeouts and retries

Rating and limiting

Mutual TLS

Service discovery

Role-based access control

gRPC / RSocket

Dynamic routing — A/B, canary deployments

Service Mesh features

Page 22: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Technology Options

AWS App Mesh

Service Mesh

Page 23: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Page 24: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Istio & Linkerd Deep Dive

● Traffic Management

● Security

● Installation / Configuration

● Supported Environments

● Observability

● Policy Management

● Performance

Page 25: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Traffic Management

Istio Comments Linkerd Comments

TCP Proxying Yes Yes

Load Balancing YesSupports: Round Robin, Least Conn, Random and Passthrough

Yesuses EWMA (exponentially weighted moving average) to identify optimal targets

Subset Load Balancing YesUseful for Canary, Blue/Green deployments and A/B tests

No

Session Affinity YesCookie Hash-based LB for HTTP providing soft session affinity

No

Circuit Breaking Yes as Outlier Detectionsee

comments

no configuration options. EWMA balancing will give less preference to unhealthy targets, achieving something circuit-breaker-like

Retries Yes Yes

Page 26: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Traffic Management

Points to Consider

- Load Balancing algorithms

- Subset Load Balancing

- Session Affinity

Round Robin, Least Conn, Random and Passthrough

Peak EWMA: maintain a moving average of each replica’s round-trip time, weighted by the number of outstanding requests, and distribute traffic to replicas where that cost function is smallest.

catalog v3

catalog v2

catalog v1

catalog green

catalog blue

Page 27: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Peak EWMA

EWMAt=λYt+(1−λ)EWMAt−1

For t=1,2,…,n.Where- EWMA0 is the mean of historical data (target) - Yt is the observation at time t n is the number of observations to be monitored including EWMA0 - - - 0<λ≤1 is a constant that determines the depth of memory of the EWMA.

Page 28: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

T node1 node2 node3 EWMA NODE3 EWMA NODE3 EWMA NODE3

1 32 43 33

2 35 64 43 30.00 30.00 30.00

3 64 24 53 32.50 47.00 36.50

4 53 53 63 48.25 35.50 44.75

5 13 31 24 50.63 44.25 53.88

6 24 14 35 31.81 37.63 38.94

7 53 32 64 27.91 25.81 36.97

8 45 43 52 40.45 28.91 50.48

9 65 352 22 42.73 35.95 51.24

10 75 124 3402 53.86 193.98 36.62

11 14 464 35 64.43 158.99 1719.31

12 24 32 45 39.22 311.49 877.16

13 26 35 452 31.61 171.75 461.08

14 63 131 53 28.80 103.37 456.54

15 134 24 234 45.90 117.19 254.77

16 1353 531 53 89.95 70.59 244.38

17 314 132 522 721.48 300.80 148.69

517.74 216.40 335.35

Page 29: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Traffic ManagementIstio Comments Linkerd Comments

Retry Budgets No YesUsed to avoid retry storms and unnecessary retries.

Timeouts Yes Yes

Fault Injection Yes No

Ingress YesProvided by the Istio ingress-gateway, other gateways supported as well.

see comments

Linkerd does not ship its own Ingress proxy but can be configured to work with popular options such as Nginx, Gloo, and others.

Traffic Filters Yescustom envoy filters can be added to the chain.

No

External Routing Yes Yes

Header-based matching Yes No only path-based matching

Add/Change/Remove custom headers

Yes No

Page 30: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Points to Consider

- Fault Injection

- Custom Envoy Filters

- Header Based Matching

- Add / Remove Headers

Traffic Management

Page 31: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Custom Envoy Filter - Gloo Example

Page 32: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Retry Budgets and Retry Storm

A retry storm is an undesirable client/server failure mode where one or

more peers become unhealthy, causing clients to retry a significant

fraction of requests. This has the effect of multiplying the volume of

traffic sent to the unhealthy peers, exacerbating the problem.

Traffic Management

Page 33: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Retry Budgets and Retry Storm

Traffic Management

paymentsbooking !

paymentsbooking ! ! !

paymentsbooking

- Retry Ratio - amount of retries based on number of requests - example, 20%- TTL - how long should requests be considered

Page 34: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

SecurityIstio Comments Linkerd

Supports mTLS Yes Yes

TLS on By Default see commentsIstio instructions include details on how to install with TLS on both permissive and restrictive mode

Yes

Certificate Rotation Yes Yes

External Root Certificate Support Yes Yes

Both technologies support Mutual TLS and can rely on external Root certificates.

Page 35: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

For Linkerd, the pre-check (or check --pre) verifies if you have the permission to create Kubernetes resources required during the install process.

Installation and Configuration

Istio Linkerd Comments

Prerequisites check No Yes linkerd check --pre

Requires Sidecar No Yes

Supports automatic Sidecar Injection Yes Yes

Page 36: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Supported Environments and Deployment Models

Istio Comments Linkerd

Kubernetes Yes Yes

Non Kubernetes Yes Virtual Machines, Cloud Foundry, Consul/Nomad No

Multi-cluster Support - Multiple Control Planes Yes Yes

Multi-cluster Support - Single control plane Yes No

Points to Consider

- Linkerd 2.3 only Supports Kubernetes

- Both support multi-cluster with multiple control planes.

- Istio handles more complex multi-cluster scenarios

Page 37: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Istio - Multi-Cluster - Multiple Control Planes

Page 38: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Istio - Multi-Cluster - Single Control Plane - VPN

Page 39: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Istio - Multi-Cluster - SCP - Border Gateways

Page 40: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Observability

Istio Comments Linkerd Comments

Admin Dashboard No Yes

Observability Dashboard Yes Includes Kiali YesIncludes the Linkerd dashboard and also pre-configured Grafana dashboards

Tracing Yes No

Tracing can still be achieved by instrumenting applications.For debugging purposes, the Tap feature allows you to 'listen' to traffic on a resource.

Point to Consider

- Linkerd does not have Distributed Tracing but has a Tap Feature.

Page 41: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Observability

Page 42: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Observability

Page 43: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Policy Management

Template Provider

API Key

Analytics Apigee

Authorization Apigee, OPA

Check Nothing Denier

Edge

Kubernetes Kubernetes Env

List Entry Denier, List

Log Entry Fluentd, SolarWinds, Stackdriver, Stdio

MetricApache SkyWalking, Circonus, CloudWatch, Datadog, Prometheus, SignalFx, SolarWinds,

Stackdriver, StatsD, Stdio, Wavefront by VMware

Quota Denier, Memory quota, Redis Quota

Report Nothing

Trace Span SignalFx, Stackdriver, Zipkin

Page 44: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Policy Management

Istio Linkerd Comments

OIDC/Oauth2 Yes No Principal authentication is delegated to the applications

Rate Limits Yes No

Adapter Support Yes No

Point to Consider

- Linkerd does not a policy management system such as Istio. Policy needs to be

implemented at an Ingress or Application Level.

Page 45: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Performance

On the server side, the Istio/Envoy sidecar uses ~60% more CPU than Linkerd.

Source: https://medium.com/@michael_87395/benchmarking-istio-linkerd-cpu-c36287e32781

Page 46: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Performance

On the Linkerd2-meshed setup, the p99.9 latency (red) ranged from 8.0 ms to 12.0 ms.

The p99.9 latency (red) incurred by the Istio-meshed setup, ranging from 35.0 ms to 55.0 ms. The p99 latency (orange) fell in the range of 22.6 ms to 27.2 ms.

Source: https://medium.com/@ihcsim/linkerd-2-0-and-istio-performance-benchmark-df290101c2bb

Page 47: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

How to Know if you need a Service Mesh

Service Governance05

Multiple Language Platforms04

Service Availability / SLA03

Running Distributed Systems01

Advanced CI/CD Pipelines02

Page 48: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Recommended Tools

$ supergloo install istio

$ supergloo install linkerdsupergloo.solo.io

Service Mesh Observabilitykiali.io

flaggerFlagger is a Kubernetes operator that automates the promotion of canary deployments using Istio or App Mesh routing for traffic shifting and Prometheus metrics for canary analysis.

flagger.app

Page 49: Service Mesh - QConSP€¦ · between microservices and interferes in the traffic to increase the reliability and security of the whole system. rettori A Service Mesh is an intelligent

rettori

Thank you.