From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

45

Transcript of From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

Page 1: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 2: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 3: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 4: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 5: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 6: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 7: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

2010

Page 8: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 9: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 10: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 11: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 12: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

2014

- Error handling first class citizen 

Page 13: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 14: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 15: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 16: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 17: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 18: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 19: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 20: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 21: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 22: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 23: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 24: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 25: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 26: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 27: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

schema registry

Your App

Producer

Serializer

Check is format is acceptable

Retrieve schema ID

Topic

Incompatible data error

Schema ID + Data

Kafka

producerProps.put(“key.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");producerProps.put("value.serializer","io.confluent.kafka.serializers.KafkaAvroSerializer");

Page 28: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 29: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 30: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

shipments topic

sales topic

low inventory topicspark

streaminggenerate

data

let’s see some code

Page 31: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

Define the data contract / schema in Avro format

Page 32: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

generate data

1,9 M msg / secusing 1 thread

Page 33: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

https://schema-registry-ui.landoop.com

Schemas registered for us :-)

Page 34: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 35: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

Defining the typed data format

Page 36: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

Initiate the streaming from 2 topics

Page 37: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

The business logic

Page 38: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

shipments topic

sales topic

low inventory topicspark

streaming

elastic-search

re-ordering

Page 39: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

Simple is beautiful

Page 40: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 41: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 42: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 43: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 44: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing
Page 45: From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and #stream-processing

landoop.com/bloggithub.com/landoop