Kafka Connect by Datio

24
Kafka Connect

Transcript of Kafka Connect by Datio

Page 1: Kafka Connect by Datio

Kafka Connect

Page 2: Kafka Connect by Datio

Contents

Introduction to Kafka123

Data Ingestion

Kafka Connect

Page 3: Kafka Connect by Datio

1.INTRODUCTION TO KAFKA

Let’s start with the first set of slides

Page 4: Kafka Connect by Datio
Page 5: Kafka Connect by Datio

Data Pipeline ProblemFrontend

Server

Metrics Server

Frontend Server

Frontend Server Database

Server

Chat Server

Metrics Server

Metrics UI

Inter-process communication channel

Page 6: Kafka Connect by Datio

Data Pipeline Problem

Frontend Server

Shopping Cart

Backend Server

Metrics Server

Metrics UI Log Search

Database Server

Frontend Server

Frontend Server

Database Server

Chat Server

Metrics Server

Metrics UI

Metrics Pub/Sub

Metrics Pub/Sub

Logging Pub/Sub

A publish/subscribe System Multiple publish/subscribe Systems

Page 7: Kafka Connect by Datio

Kafka Goals

Frontend Server

Metrics Server

Metrics UI Log Search

✓ Decouple data pipelines

✓ Provide persistence for message data to allow multiple consumers

✓ Optimize for high throughput of messages

✓ Allow for horizontal scaling of the system to grow as the data stream grow

Database Server Shopping

Cart

Backend Server

Page 8: Kafka Connect by Datio

ProducerProducer

ProducerProducer

Consumer Consumer Consumer

Broker 1

Topic APartition 0

Broker 2

Topic APartition 1

Broker 3

Topic B

ZOOKEEPER

Kafka Architecture

Kafka Cluster

Topic C

Page 9: Kafka Connect by Datio

Disk-based retentionScalable

High throughput

Page 10: Kafka Connect by Datio

USE CASESGold Data

Data Lake

Data

Page 11: Kafka Connect by Datio

2.DATA INGESTION

Page 12: Kafka Connect by Datio

Kafka Ingestion

Kafka Producer

Kafka Consumer

Kafka Cluster

Use Case Requirements

DATA LOSS ?

EXCATLY ONCE ?

LATENCY ?

THROUGHPUT ?

Page 13: Kafka Connect by Datio

Producer Record

KafkaBroker

Topic

Partition

KeyValue

producer.send(record).get

Exception/Metadata

Producer Record

KafkaBroker

Topic

Partition

KeyValue

producer.send(record)

Exception/Metadata

Asynchronous sendSynchronous send

Producer

Page 14: Kafka Connect by Datio

Producer Record

Serializer

Partitioner

Topic APartition 0

Batch 0

Batch 1

Topic BPartition 1

Batch 0

Batch 1

Fail?

Retry?

Yes

Yes

MetadataException

Send()Topic

Partition

KeyValue

TopicPartition

commit

Metadata

TopicPartitionOffset

BrokerProducer

Page 15: Kafka Connect by Datio

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12

Consumer 0

Consumer 1

Consumer 2

Consumer Group

Partition 0

Partition 1

Partition 2

Partition 3

Partition 0

Partition 1

Partition 2

Partition 3

Partition 0

Partition 1

Partition 2

Partition 3

Partition 0

Partition 1

Partition 2

Partition 3

Consumer 1

Consumer 2

Consumer 1

Consumer 2

Consumer 3

Consumer 4

Consumer 1

Consumer 2

Consumer 3

Consumer 4

Consumer 5

Consumer 6

Topic T1 Topic T1 Topic T1Consumer Group 1 Consumer Group 1

Consumer Group 1

Consumer

Page 16: Kafka Connect by Datio

ChannelChannel Processor

Interceptor #1

Interceptor #N

SinkSource

Flume Agent

Reliable

Fault Tolerant

Customizable

Manageable

Centralized Sources

Apache Flume

Page 17: Kafka Connect by Datio

Consumer

Kafka as reliable flume channel

Flume + Kafka

Source Sink

ChannelProducer

Flume as kafka producer/consumer

Page 18: Kafka Connect by Datio

3.KAFKA CONNECT

Page 19: Kafka Connect by Datio

Data Source

Schema Registry

Data Sink

Kaf

ka S

ourc

e C

onne

ct

Kafka Connect

Ingestion integration

Streaming and batch

Scales to the application

Failover control

Accessible connector API

Kaf

ka S

ink

Con

nect

Page 20: Kafka Connect by Datio

Worker Mode

Connect

Kafka Producer

Kafka Consumer

Worker TaskWorker TaskWorker

TaskThread

Standalone Distributed

Connector

Input N

Input 2

Input 1

Task

Worker

Page 21: Kafka Connect by Datio

Producer Record

Serializer

Partitioner

Topic APartition 0

Batch 0

Batch 1

Topic BPartition 1

Batch 0

Batch 1

Fail?

Retry?

Yes

Yes

MetadataException

Send()Topic

Partition

KeyValue

TopicPartition

commit

Metadata

TopicPartitionOffset

Broker

Worker settings to ensure no data loss

request.timeout.ms=MAX_VALUE

retries=MAX_VALUE

max.in.flight.request.per.connection=1

acks=all

max.block.ms=MAX_VALUE

Page 22: Kafka Connect by Datio

Worker 2 Worker 3Worker 1

Worker TaskWorker TaskWorker Task

Workerconfig

Sourceconfig

Sourceconfig

Sourceconfig

Workerconfig

Worker TaskWorker TaskWorker Task

Conn 1, Task 3Partitions: 5,6

Conn 2, Task 1Partitions: 1,2

Conn 2

Conn 1, Task 2Partitions: 3,4

Conn 1

Conn 1, Task 1Partitions: 1,2

Conn 2, Task 2Partitions: 3,4

Standaloneworker

Scalability

Fault tolerance

Share

connectors &

tasks

Distributed Worker

Simple

1 Worker

N conn/tasks

Page 23: Kafka Connect by Datio

Schema Registry

Consumer

SubjectTopic

SchemaVersion

Worker Task sendRecords()

SourceRecordSourceRecordSourceRecord Producer

RecordProducerRecordProducerRecord

Task

REST API

Serializers

Formaters

Multiple version

of the same

schema

StreamsStreamsStream

Source

Schema

Id Id

Schema

SerializerDeserializer