Reducing Microservice Complexity with Kafka and...

Post on 10-Mar-2018

259 views 6 download

Transcript of Reducing Microservice Complexity with Kafka and...

Reducing Microservice Complexity with Kafka and Reactive Streams

Specialist Software DeveloperJim Riecken

@jimriecken - jim.riecken@hootsuite.com

@jimriecken

• Monolith to Microservices + Complexity• Asynchronous Messaging• Kafka• Reactive Streams + Akka Streams

Agenda

• Details on how to set up a Kafka cluster• In-depth tutorial on Akka Streams

Anti-Agenda

Monolith to Microservices

M

Effic

ienc

y

Time

MS1

S2

F

S1

S2

S3

S4

S5

Effic

ienc

y

Time

• Small• Scalable• Independent• Easy to Create• Clear ownership

Network Calls

• Latency• Failure

~99.5%

Reliability

99.9% 99.9% 99.9% 99.9%

Coordination

• Between services• Between teams

AsynchronousMessaging

Message Bus

Synchronous

Asynchronous

• Decoupling• Pub/Sub

• Less coordination• Additional consumers are easy• Help scale organization

Why?

• Well-defined delivery semantics• High-Throughput• Highly-Available• Durable• Scalable• Backpressure

Messaging Requirements

Kafka

• Distributed, partitioned, replicated commit log service

• Pub/Sub messaging functionality• Created by LinkedIn, now an Apache

open-source project

What is Kafka?

Producers

Kafka Brokers

Consumers

0 | 1 | 2 | 3 | 4 | 5

0 | 1 | 2 | 3 | 4 | 5 | 6

0 | 1 | 2 | 3

P0

P1

P2

New Messages Appended

Topic

Topics + Partitions

• Send messages to topics• Responsible for choosing which

partition to send to• Round-robin• Consistent hashing based on a

message key

Producers

• Pull messages from topics• Track their own offset in each

partition

Consumers

P0 P1 P2

1 2 3 4 5 6

Topic

Group 1 Group 2

How does Kafka meet the

requirements?

• Hundreds of MB/s of reads/writes from thousands of concurrent clients

• LinkedIn (2015)• 800 billion messages per day (18 million/s

peak)• 175 TB of data produced per day• > 1000 servers in 60 clusters

Kafka is Fast

• Brokers• All data is persisted to disk• Partitions replicated to other nodes

• Consumers• Start where they left off

• Producers• Can retry - at-least-once messaging

Kafka is Resilient

• Capacity can be added at runtime with zero downtime• More servers => more disk space

• Topics can be larger than any single node could hold

• Additional partitions can be added to add more parallelism

Kafka is Scalable

• Large storage capacity• Topic retention is a Consumer SLA

• Almost impossible for a fast producer to overload a slow consumer• Allows real-time as well as batch

consumption

Kafka Helps with Back-Pressure

Message Data Format

• Array[Byte]• Serialization?• JSON?• Protocol Buffers

• Binary - Fast• IDL - Code Generation• Message evolution

Messages

Processing Data with Reactive

Streams

• Standard for async stream processing with non-blocking back-pressure• Subscriber signals demand to publisher• Publisher sends no more than demand

• Low-level• Mainly meant for library authors

Reactive Streams

Publisher[T] Subscriber[T]

onSubscribe(s: Subscription)onNext(t: T)onComplete()onError(t: Throwable)

Subscription

subscribe(s: Subscriber[-T])

request(n: Long)cancel()

Processing Data with Akka Streams

• Library on top of Akka Actors and Reactive Streams

• Process sequences of elements using bounded buffer space

• Strongly Typed

Akka Streams

Flow

Source

SinkFanOut

FanIn

Concepts

Runnable Graph

Concepts

Composition

• Turning on the tap• Create actors• Open files/sockets/other resources

• Materialized values• Source: Actor, Promise, Subscriber• Sink: Actor, Future, Producer

Materialization

Reactive Kafka

• https://github.com/akka/reactive-kafka • Akka Streams wrapper around Kafka

API• Consumer Source• Producer Sink

Reactive Kafka

• Sink - sends message to Kafka topic• Flow - sends message to Kafka topic +

emits result downstream• When the stream completes/fails the

connection to Kafka will be automatically closed

Producer

• Source - pulls messages from Kafka topics

• Offset Management• Back-pressure• Materialization

• Object that can stop the consumer (and complete the stream)

Consumer

Simple Producer Example implicit val system = ActorSystem("producer-test")

implicit val materializer = ActorMaterializer()

val producerSettings = ProducerSettings(

system, new ByteArraySerializer, new StringSerializer

).withBootstrapServers("localhost:9092")

Source(1 to 100)

.map(i => s"Message $i")

.map(m => new ProducerRecord[Array[Byte], String]("lower", m))

.to(Producer.plainSink(producerSettings)).run()

Simple Consumer Example implicit val system = ActorSystem("producer-test")

implicit val materializer = ActorMaterializer()

val consumerSettings = ConsumerSettings(

system, new ByteArrayDeserializer, new StringDeserializer,

).withBootstrapServers("localhost:9092").withGroupId("test-group")

val control = Consumer.atMostOnceSource(

consumerSettings.withClientId("client1"), Subscriptions.topics("lower"))

.map(record => record.value)

.to(Sink.foreach(v => println(v))).run()

control.stop()

val control = Consumer.committableSource(

consumerSettings.withClientId("client1"), Subscriptions.topics("lower"))

.map { msg =>

val upper = msg.value.toUpperCase

ProducerMessage.Message(

new ProducerRecord[Array[Byte], String]("upper", upper),

msg.committableOffset)

}.to(Producer.commitableSink(producerSettings)).run()

control.stop()

Combined Example

Demo

Wrap-Up

• Microservices have many advantages, but can introduce failure and complexity.

• Asynchronous messaging can help reduce this complexity and Kafka is a great option.

• Akka Streams makes reliably processing data from Kafka with back-pressure easy

Wrap-Up

Thank you!Questions?

@jimriecken - jim.riecken@hootsuite.comJim Riecken