Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka

Click here to load reader

download Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka

of 65

Embed Size (px)

Transcript of Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka

  1. 1. BASEL BERN BRUGG DSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MNCHEN STUTTGART WIEN ZRICH Kafka Connect & Streams the Ecosystem around Kafka Guido Schmutz 29.11.2017 @gschmutz guidoschmutz.wordpress.com
  2. 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Kafka Connect & Streams - the Ecosystem around Kafka
  3. 3. Our company. Kafka Connect & Streams - the Ecosystem around Kafka Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  4. 4. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. Kafka Connect & Streams - the Ecosystem around Kafka 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  5. 5. Agenda 1. What is Apache Kafka? 2. Kafka Connect 3. Kafka Streams 4. KSQL 5. Kafka and "Big Data" / "Fast Data" Ecosystem 6. Kafka in Software Architecture Kafka Connect & Streams - the Ecosystem around Kafka
  6. 6. Demo Example Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt- source truck_ position detect_danger ous_driving dangerous_ driving Truck Driver jdbc-source trucking_ driver join_dangerous _driving_driver dangerous_dri ving_driver console consumer 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka 27,Walter,Ward,Y,24-JUL-85,2017-10-0215:19:00 {"id":27,"firstName":"Walter", "lastName":"Ward","available ":"Y","birthdate":"24-JUL- 85","last_update":150692305 2012}
  7. 7. What is Apache Kafka? Kafka Connect & Streams - the Ecosystem around Kafka
  8. 8. Apache Kafka History 2012 2013 2014 2015 2016 2017 Clustermirroring datacompression Intra-cluster replication 0.7 0.8 0.9 DataProcessing (StreamsAPI) 0.10 DataIntegration (ConnectAPI) 0.11 2018 ExactlyOnce Semantics Performance Improvements KSQLDeveloper Preview Kafka Connect & Streams - the Ecosystem around Kafka 1.0 JBODSupport SupportJava9
  9. 9. Apache Kafka - Unix Analogy $ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt KafkaConnectAPI KafkaConnectAPIKafkaStreamsAPI KafkaCore(Cluster) Adaptedfrom:Confluent KSQL Kafka Connect & Streams - the Ecosystem around Kafka
  10. 10. Kafka High Level Architecture The who is who Producers write data to brokers. Consumers read data from brokers. All this is distributed. The data Data is stored in topics. Topics are split into partitions, which are replicated. Kafka Cluster Consumer Consumer Consumer Producer Producer Producer Broker 1 Broker 2 Broker 3 Zookeeper Ensemble Kafka Connect & Streams - the Ecosystem around Kafka
  11. 11. Kafka Distributed Log at the Core At the heart of Apache Kafka sits a distributed log collection of messages, appended sequentially to a file service seeks to the position of the last message it read, then scans sequentially, reading messages in order log-structured character makes Kafka well suited to performing the role of an Event Store in Event Sourcing Event Hub 01020304050607080910111213141516171819202122 Reads are a single seek & scan Writes are append only Kafka Connect & Streams - the Ecosystem around Kafka
  12. 12. Scale-Out Architecture Kafka Connect & Streams - the Ecosystem around Kafka topic consists of many partitions producer load load-balanced over all partitions consumer can consume with as many threads as there are partitions Producer 1 Consumer 1 Broker 1 Producer 2 Producer 3 Broker 2 Broker 3 Consumer 2 Consumer 3 Consumer 4 ConsumerGroup1 ConsumerGroup2 KafkaCluster
  13. 13. Strong Ordering Guarantees most business systems need strong ordering guarantees messages that require relative ordering need to be sent to the same partition supply same key for all messages that require a relative order To maintain global ordering use a single partition topic Producer 1 Consumer 1 Broker 1 Broker 2 Broker 3 Consumer 2 Consumer 3 Key-1 Key-2 Key-3 Key-4 Key-5 Key-6 Key-3 Key-1 Kafka Connect & Streams - the Ecosystem around Kafka
  14. 14. Durable and Highly Available Messaging Producer 1 Broker 1 Broker 2 Broker 3 Producer 1 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 1 Consumer 2Consumer 2 Kafka Connect & Streams - the Ecosystem around Kafka P1 P0 P0 P0 P1 P1 P1 P0 P0 P0 P1 P1
  15. 15. Durable and Highly Available Messaging (II) Producer 1 Broker 1 Broker 2 Broker 3 Producer 1 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 1 Consumer 2 Consumer 2 Kafka Connect & Streams - the Ecosystem around Kafka P1 P0 P0 P0 P1 P1 P1 P0 P0 P0 P1 P1
  16. 16. Replay-ability Logs never forget by keeping events in a log, we have a version control system for our data if you were to deploy a faulty program, the system might become corrupted, but it would always be recoverable sequence of events provides an audit point, so that you can examine exactly what happened rewind and reply events, once service is back and bug is fixed Event Hub 01020304050607080910111213141516171819202122 Replay Rewind Service Logic State Kafka Connect & Streams - the Ecosystem around Kafka
  17. 17. Hold Data for Long-Term Data Retention Producer 1 Broker 1 Broker 2 Broker 3 1. Never 2. Time based (TTL) log.retention.{ms | minutes | hours} 3. Size based log.retention.bytes 4. Log compaction based (entries with same key are removed): kafka-topics.sh --zookeeper zk:2181--create --topic customers--replication-factor 1--partitions 1--config cleanup.policy=compact Kafka Connect & Streams - the Ecosystem around Kafka
  18. 18. Keep Topics in Compacted Form 0 1 2 3 4 5 6 7 8 9 10 11 K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 Offset Key Value 3 4 6 8 9 10 K1 K3 K4 K5 K2 K6 V4 V5 V7 V9 V10 V11 Offset Key Value Compaction Kafka Connect & Streams - the Ecosystem around Kafka V1 V2 V3 V4 V5 V6 V7 V8 V9V10 V11 K1 K3 K4 K5K2 K6
  19. 19. How to get a Kafka environent Kafka Connect & Streams - the Ecosystem around Kafka On Premises Bare Metal Installation Docker Mesos / Kubernetes Hadoop Distributions Cloud Oracle Event Hub Cloud Service Azure HDInsight Kafka Confluent Cloud
  20. 20. Demo (I) Truck-2 truck position Truck-1 Truck-3 console consumer 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Testdata-GeneratorbyHortonworks Kafka Connect & Streams - the Ecosystem around Kafka
  21. 21. Demo (I) Create Kafka Topic $ kafka-topics --zookeeper zookeeper:2181 --create--topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics --zookeeper zookeeper:2181 list __consumer_offsets _confluent-metrics _schemas docker-connect-configs docker-connect-offsets docker-connect-status truck_position Kafka Connect & Streams - the Ecosystem around Kafka
  22. 22. Demo (I) Run Producer and Kafka-Console-Consumer Kafka Connect & Streams - the Ecosystem around Kafka
  23. 23. Demo (I) Java Producer to "truck_position" Constructing a Kafka Producer private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker-1:9092); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer(kafkaProps); ProducerRecord record = new ProducerRecord("truck_position", driverId, eventData); try { metadata = producer.send(record).get(); } catch (Exception e) {} Kafka Connect & Streams - the Ecosystem around Kafka
  24. 24. Demo (II) devices send to MQTT instead of Kafka Truck-2 truck/nn/ position Truck-1 Truck-3 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka
  25. 25. Demo (II) devices send to MQTT instead of Kafka Kafka Connect & Streams - the Ecosystem around Kafka
  26. 26. Demo (II) - devices send to MQTT instead of Kafka how to get the data into Kafka? Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw ? 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka
  27. 27. Kafka Connect Kafka Connect & Streams - the Ecosystem around Kafka
  28. 28. Kafka Connect - Overview Source Connector Sink Connector Kafka Connect & Streams - the Ecosystem around Kafka
  29. 29. Kafka Connect Single Message Transforms (SMT) Simple Transformations for a single message Defined as part of Kafka Connect some useful transforms provided out-of-the-box Easily implement your own Optionally deploy 1+ transforms with each connector Modify messages produced by source connector Modify messages sent to sink connectors Makes it much easier to mix and match connectors Some of currently available transforms: Inse