Reducing Microservice Complexity with Kafka and...
Transcript of Reducing Microservice Complexity with Kafka and...
Reducing Microservice Complexity with Kafka and Reactive Streams
Specialist Software DeveloperJim Riecken
@jimriecken - [email protected]
@jimriecken
• Monolith to Microservices + Complexity• Asynchronous Messaging• Kafka• Reactive Streams + Akka Streams
Agenda
• Details on how to set up a Kafka cluster• In-depth tutorial on Akka Streams
Anti-Agenda
Monolith to Microservices
M
Effic
ienc
y
Time
MS1
S2
F
S1
S2
S3
S4
S5
Effic
ienc
y
Time
• Small• Scalable• Independent• Easy to Create• Clear ownership
Network Calls
• Latency• Failure
~99.5%
Reliability
99.9% 99.9% 99.9% 99.9%
Coordination
• Between services• Between teams
AsynchronousMessaging
Message Bus
Synchronous
Asynchronous
• Decoupling• Pub/Sub
• Less coordination• Additional consumers are easy• Help scale organization
Why?
• Well-defined delivery semantics• High-Throughput• Highly-Available• Durable• Scalable• Backpressure
Messaging Requirements
Kafka
• Distributed, partitioned, replicated commit log service
• Pub/Sub messaging functionality• Created by LinkedIn, now an Apache
open-source project
What is Kafka?
Producers
Kafka Brokers
Consumers
0 | 1 | 2 | 3 | 4 | 5
0 | 1 | 2 | 3 | 4 | 5 | 6
0 | 1 | 2 | 3
P0
P1
P2
New Messages Appended
Topic
Topics + Partitions
• Send messages to topics• Responsible for choosing which
partition to send to• Round-robin• Consistent hashing based on a
message key
Producers
• Pull messages from topics• Track their own offset in each
partition
Consumers
P0 P1 P2
1 2 3 4 5 6
Topic
Group 1 Group 2
How does Kafka meet the
requirements?
• Hundreds of MB/s of reads/writes from thousands of concurrent clients
• LinkedIn (2015)• 800 billion messages per day (18 million/s
peak)• 175 TB of data produced per day• > 1000 servers in 60 clusters
Kafka is Fast
• Brokers• All data is persisted to disk• Partitions replicated to other nodes
• Consumers• Start where they left off
• Producers• Can retry - at-least-once messaging
Kafka is Resilient
• Capacity can be added at runtime with zero downtime• More servers => more disk space
• Topics can be larger than any single node could hold
• Additional partitions can be added to add more parallelism
Kafka is Scalable
• Large storage capacity• Topic retention is a Consumer SLA
• Almost impossible for a fast producer to overload a slow consumer• Allows real-time as well as batch
consumption
Kafka Helps with Back-Pressure
Message Data Format
• Array[Byte]• Serialization?• JSON?• Protocol Buffers
• Binary - Fast• IDL - Code Generation• Message evolution
Messages
Processing Data with Reactive
Streams
• Standard for async stream processing with non-blocking back-pressure• Subscriber signals demand to publisher• Publisher sends no more than demand
• Low-level• Mainly meant for library authors
Reactive Streams
Publisher[T] Subscriber[T]
onSubscribe(s: Subscription)onNext(t: T)onComplete()onError(t: Throwable)
Subscription
subscribe(s: Subscriber[-T])
request(n: Long)cancel()
Processing Data with Akka Streams
• Library on top of Akka Actors and Reactive Streams
• Process sequences of elements using bounded buffer space
• Strongly Typed
Akka Streams
Flow
Source
SinkFanOut
FanIn
Concepts
Runnable Graph
Concepts
Composition
• Turning on the tap• Create actors• Open files/sockets/other resources
• Materialized values• Source: Actor, Promise, Subscriber• Sink: Actor, Future, Producer
Materialization
Reactive Kafka
• https://github.com/akka/reactive-kafka • Akka Streams wrapper around Kafka
API• Consumer Source• Producer Sink
Reactive Kafka
• Sink - sends message to Kafka topic• Flow - sends message to Kafka topic +
emits result downstream• When the stream completes/fails the
connection to Kafka will be automatically closed
Producer
• Source - pulls messages from Kafka topics
• Offset Management• Back-pressure• Materialization
• Object that can stop the consumer (and complete the stream)
Consumer
Simple Producer Example implicit val system = ActorSystem("producer-test")
implicit val materializer = ActorMaterializer()
val producerSettings = ProducerSettings(
system, new ByteArraySerializer, new StringSerializer
).withBootstrapServers("localhost:9092")
Source(1 to 100)
.map(i => s"Message $i")
.map(m => new ProducerRecord[Array[Byte], String]("lower", m))
.to(Producer.plainSink(producerSettings)).run()
Simple Consumer Example implicit val system = ActorSystem("producer-test")
implicit val materializer = ActorMaterializer()
val consumerSettings = ConsumerSettings(
system, new ByteArrayDeserializer, new StringDeserializer,
).withBootstrapServers("localhost:9092").withGroupId("test-group")
val control = Consumer.atMostOnceSource(
consumerSettings.withClientId("client1"), Subscriptions.topics("lower"))
.map(record => record.value)
.to(Sink.foreach(v => println(v))).run()
control.stop()
val control = Consumer.committableSource(
consumerSettings.withClientId("client1"), Subscriptions.topics("lower"))
.map { msg =>
val upper = msg.value.toUpperCase
ProducerMessage.Message(
new ProducerRecord[Array[Byte], String]("upper", upper),
msg.committableOffset)
}.to(Producer.commitableSink(producerSettings)).run()
control.stop()
Combined Example
Demo
Wrap-Up
• Microservices have many advantages, but can introduce failure and complexity.
• Asynchronous messaging can help reduce this complexity and Kafka is a great option.
• Akka Streams makes reliably processing data from Kafka with back-pressure easy
Wrap-Up
Thank you!Questions?
@jimriecken - [email protected] Riecken