Staging Reactive data pipelines using Simon Souter...
Transcript of Staging Reactive data pipelines using Simon Souter...
![Page 1: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/1.jpg)
![Page 2: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/2.jpg)
Jaakko Pallari (@lepovirta)
Simon Souter (@simonsouter)
Staging Reactive data pipelines using Kafka as the backbone
/cakesolutions /scala-kafka-client
![Page 3: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/3.jpg)
MANCHESTER LONDON NEW YORK
Reactive Solutions at Cake
![Page 4: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/4.jpg)
Contents
1. Reactive Data Pipelines
2. Kafka as a Reactive Message Queue
3. Architecture & Consumer Patterns
4. Streaming Application Development
![Page 5: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/5.jpg)
Stream Processing
● Big Data● Processing in Real-time● Event Throughput vs Number of Queries● IoT
Source Service Sink
![Page 6: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/6.jpg)
Distributed Streaming Engines
● Server Applications● Stream topologies deployed to cluster● Framework design
![Page 7: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/7.jpg)
Streaming from ground-up
● Custom Streaming Applications● Leverage existing tool stack
Source Application Sink
![Page 8: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/8.jpg)
Staged data pipelines
● Staged Event Driven Architecture● Processes separated by a queue● Processing in stages
Process Queue Process QueueQueue
![Page 9: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/9.jpg)
Reactive data pipelines
● Responsive● Resilient● Elastic● Message Driven
Process Queue ProcessSource Sink
![Page 10: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/10.jpg)
Streaming from ground-up
● Microservices as processing components
Source Microservice 1 Microservice 2
Microservice 1
Microservice 1
Microservice 2
Microservice 2
SinkQueue
![Page 11: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/11.jpg)
● Deployment via cluster orchestration services
Streaming from ground-up
Source Microservice 1 Queue Microservice 2
Microservice 1
Microservice 1
Microservice 2
Microservice 2
Sink
Orchestration Service
Scale up
![Page 12: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/12.jpg)
Streaming from ground-up
● Messaging middleware for resilient data distribution between microservices
Source Microservice 1 Queue Microservice 2
Microservice 1
Microservice 1
Microservice 2
Microservice 2
Sink
![Page 13: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/13.jpg)
What is Kafka?
● Distributed Message Broker● Supports Parallel Streaming● Kafka as a Reactive MQ
Source Microservice 1 Kafka Microservice 2
Microservice 1
Microservice 1
Microservice 2
Microservice 2
Sink
![Page 14: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/14.jpg)
Kafka Topic:“Electric_Readings”
Kafka: topic and message anatomy
Key: “meter1”Value: 1.34
Electric BillCalculation
Auditing
Message Driven
![Page 15: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/15.jpg)
Kafka: at-least-once delivery
Kafka Topic:“Electric_Readings”
Electric meterConsumptionAggregator
Deliver
ACK
Deliver
ACK
Resilient
![Page 16: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/16.jpg)
Kafka node 2
Kafka node 1
Kafka: clustering - arrangement
KafkaTopic
Partition 1
Partition 2
Elastic
![Page 17: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/17.jpg)
Kafka: clustering - replication
Resilient
Kafka node 2
Kafka node 1
KafkaTopic
Partition 1
Partition 2
Partition 2Replica
Partition 1Replica
![Page 18: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/18.jpg)
Kafka: clustering - consumer
Partition #1
Partition #2
Partition #3
Consumer #1
Consumer #2
Consumer #3
KafkaTopic
Responsive
Same consumer group
![Page 19: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/19.jpg)
Kafka: clustering - consumer
Partition #1
Partition #2
Partition #3
Consumer #1
Consumer #2
Consumer #3
KafkaTopic
Responsive
![Page 20: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/20.jpg)
Kafka: clustering - consumer
Partition #1
Partition #2
Partition #3
Consumer #1
Consumer #2
Consumer #3
KafkaTopic
Responsive
![Page 21: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/21.jpg)
Kafka: clustering - consumer
Partition #1
Partition #2
Partition #3
Consumer #1
Consumer #2
Consumer #3
KafkaTopic
Responsive
![Page 22: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/22.jpg)
Kafka: clustering - consumer
Partition #1
Partition #2
Partition #3
Consumer #1
Consumer #2
Consumer #3
KafkaTopic
Responsive
![Page 23: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/23.jpg)
Kafka: clustering - consumer
Partition #1
Partition #2
Partition #3
Consumer #1
Consumer #2KafkaTopic
Responsive
Consumer #3
Consumer #4 No Data
![Page 24: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/24.jpg)
Kafka: high throughput
● Single partition consumer: 20-90 Mb/sec
Responsive
![Page 25: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/25.jpg)
Kafka the Reactive MQ
Message Driven● Key-value messages
Responsive● Consumer clustering● High throughput
Resilient● At-least-once delivery● Replication
Elastic● Linear scalability
![Page 26: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/26.jpg)
Kafka consumer patterns
Source Microservice 1 Kafka Microservice 2
Microservice 1
Microservice 1
Microservice 2
Microservice 2
Sink
![Page 27: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/27.jpg)
Simple message queue
PartitionElectricMeter
AuditingElectricReadings
Partition replica
Partition replica
Kafka Terminology:- Partition Count: 1
![Page 28: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/28.jpg)
Simple message queue - fanout
PartitionElectricMeter
AuditingElectricReadings
Partition replica
Partition replicaBilling
Kafka Terminology:- Partition Count: 1- Multiple Consumer Groups
![Page 29: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/29.jpg)
DB
Simple message queue - consumer
AuditingServiceConsumer
Client
App logic
Kafka Partition
1. Consume a batch of messages from Kafka
2. Process messages and send results to wherever necessary (e.g. another Kafka topic)
3. Confirm delivery to Kafka
Kafka Terminology:- Commit Mode: Manual
![Page 30: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/30.jpg)
Partition
Kafka: message confirmation
● Messages confirmed by offset (not individually)
Commit point
Consumer
Consumed:
Kafka Terminology:- Commit Mode: Manual
![Page 31: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/31.jpg)
Partition
Kafka: message confirmation
● Messages confirmed by offset (not individually)
Commit point
ConsumerCommit
Consumed:
Kafka Terminology:- Commit Mode: Manual
![Page 32: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/32.jpg)
Parallel workers
Partition #1
Partition #2
Partition #N
ElectricMeter
Auditing node #1
Auditing node #2
Auditing node #N
ElectricReadings
ElectricMeterElectric
Meter
Kafka Terminology:- Partition Count: >1- Single Consumer Group
![Page 33: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/33.jpg)
Kafka PartitionKafka Partition
Consumer for parallel processing
DB
AuditingServiceConsumer
Client
App logic
Kafka Partition
● Same arrangement from consumer perspective
Kafka Terminology:- Partition Count: >1- Commit Mode: Manual
![Page 34: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/34.jpg)
Orchestration● Provide Scaling Capability● Restart or replace failed nodes
Partition #1
Partition #2
Partition #N
ElectricMeter
Auditing node #1
Auditing node #2
Auditing node #N
ElectricReadings
ElectricMeterElectric
Meter
Mesos/ Marathon New node
![Page 35: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/35.jpg)
Stateful Processing● Example:
Average electricity consumption per meter for the last hour
ElectricMeter
AggregationElectricReadings
PartitionPartition
Partition
ElectricMeterElectric
Meter
![Page 36: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/36.jpg)
Aggregator for
Stream and state
Partition #1
Partition #2
Aggregator for
ElectricReadings
● Data locality
![Page 37: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/37.jpg)
Aggregator for
Stream and state
Partition #1
Partition #2
Aggregator for
Key: "meter 1"Value: 9.2
Key: "meter 2"Value: 2.7
ElectricReadings
● Data locality
![Page 38: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/38.jpg)
Aggregator for
Fault tolerance
Partition #1
Partition #2
Aggregator for
ElectricReadings
● State persistence and recovery
![Page 39: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/39.jpg)
Aggregator for
Fault tolerance
Partition #1
Partition #2
Aggregator for
ElectricReadings
Persistence
● State persistence and recovery
![Page 40: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/40.jpg)
Persistence
Stateful Processing app
Persistence
Kafka PartitionKafka Partition
Kafka/DB/?
AggregationServiceConsumer
Client
Aggregation logic
Kafka Partition
![Page 41: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/41.jpg)
AggregationServiceConsumer
Client
Aggregation logic
AggregationServiceConsumer
Client
Aggregation logic
Stateful Processing app
Persistence
Kafka PartitionKafka Partition
Kafka/DB/?
Kafka Partition
Duplicated message processing after recovery.
![Page 42: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/42.jpg)
Stateful Processing app
PersistencePersist state with partition offsets
Don't commit!Just fetch more data
Kafka PartitionKafka Partition
Kafka/DB/?
AggregationServiceConsumer
Client
Aggregation logic
Kafka Partition
Kafka Terminology:- Commit Mode: Self Managed
Offsets
![Page 43: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/43.jpg)
Partition #1
Partition #1
Stateful Processing architecture● Dynamic partition assignment● Shared Persistence for State
Aggregator 2
Aggregator 1
Persistence
Kafka/DB/?Partition #4
Partition #6
Partition #1Partition #2
Orchestration Service
Aggregator 3
![Page 44: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/44.jpg)
Partition #1
Partition #1
Stateful Processing architecture● Dynamic partition assignment● Shared Persistence for State
Aggregator 2
Aggregator 1
Persistence
Kafka/DB/?Partition #4
Partition #6
Partition #1Partition #2
Orchestration Service
Aggregator 3
![Page 45: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/45.jpg)
Streaming Patterns
Stateful Processing● Self-managed processing
state
Single Partition Topic● Strong ordering guarantees● Limited failure recovery● Scalability is limited
Multi Partition Topic● Parallel processing● Limited ordering guarantees● Kafka managed processing
state
Fanout● Independent consumer
groups
![Page 46: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/46.jpg)
Kafka libraries● Kafka client support in many languages● Scala, Java, C● C bindings -> Haskell, OCaml, Python etc.
Source Microservice 1 Kafka Microservice 2
Microservice 1
Microservice 1
Microservice 2
Microservice 2
Sink
![Page 47: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/47.jpg)
Reactive Streaming APIs● Similar paradigm as in real-time streaming platforms● Reactive Kafka
○ Based on Akka Reactive Streams API○ Scala + Java○ Developed by Akka team
● Kafka Streams○ Official streaming API for Kafka○ Java○ Developed by Confluent
![Page 48: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/48.jpg)
scala-kafka-client
● Kafka client developed for Scala● Async and non-blocking● Built on top off the official Java driver● Easy API with high performance
/cakesolutions /scala-kafka-client
![Page 49: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/49.jpg)
scala-kafka-client
● Leverage extensive Akka feature set● Processing logic implemented using
Actor Model
KafkaConsumer
Actor
KafkaProducer
Actor
ReceiverActor
Kafka Kafka
/cakesolutions /scala-kafka-client
![Page 50: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/50.jpg)
Summary
● Leverage Microservice based techniques.● Streaming topologies can be varied and complex
○ Many use-cases fall under a small set of consumer patterns.
● Challenges around scalable and reactive data pipelines● Kafka provides first-class support for reactive streaming to
your applications.● Stateful processing remains a challenging area.
![Page 51: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/51.jpg)
We didn’t discuss...
● Data serialisation● Application rolling updates● Complex streaming topologies
![Page 52: Staging Reactive data pipelines using Simon Souter ...datascienceassn.org/sites/default/files/Staging Reactive Data Pipelin… · Reactive Streaming APIs Similar paradigm as in real-time](https://reader030.fdocuments.net/reader030/viewer/2022040121/5ece3f7c1ea5d522ae670919/html5/thumbnails/52.jpg)
Questions?
MANCHESTER LONDON NEW YORK
/cakesolutions /scala-kafka-client
@cakesolutions
+44 845 617 1200