Kafka Connect & Streams - the ecosystem around Kafka

Click here to load reader

download Kafka Connect & Streams - the ecosystem around Kafka

of 60

Embed Size (px)

Transcript of Kafka Connect & Streams - the ecosystem around Kafka

  1. 1. Kafka Connect & Streams the Ecosystem around Kafka Guido Schmutz @gschmutz doag2017
  2. 2. Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz Kafka Connect & Streams - the Ecosystem around Kafka
  3. 3. Our company. Kafka Connect & Streams - the Ecosystem around Kafka Trivadis is a market leader in IT consulting, system integration, solution engineering and the provision of IT services focusing on and technologies in Switzerland, Germany, Austria and Denmark. We offer our services in the following strategic business fields: Trivadis Services takes over the interacting operation of your IT systems. O P E R A T I O N
  4. 4. COPENHAGEN MUNICH LAUSANNE BERN ZURICH BRUGG GENEVA HAMBURG DSSELDORF FRANKFURT STUTTGART FREIBURG BASEL VIENNA With over 600 specialists and IT experts in your region. Kafka Connect & Streams - the Ecosystem around Kafka 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable Experience from more than 1,900 projects per year at over 800 customers
  5. 5. Agenda 1. What is Apache Kafka? 2. Kafka Connect 3. Kafka Streams 4. KSQL 5. Kafka and "Big Data" / "Fast Data" Ecosystem 6. Kafka in Software Architecture Kafka Connect & Streams - the Ecosystem around Kafka
  6. 6. Demo Example Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt- source truck_ position detect_danger ous_driving dangerous_ driving Truck Driver jdbc-source trucking_ driver join_dangerous _driving_driver dangerous_dri ving_driver console consumer 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka 27,Walter,Ward,Y,24-JUL-85,2017-10-0215:19:00 {"id":27,"firstName":"Walter", "lastName":"Ward","available ":"Y","birthdate":"24-JUL- 85","last_update":150692305 2012}
  7. 7. What is Apache Kafka? Kafka Connect & Streams - the Ecosystem around Kafka
  8. 8. Apache Kafka History 2012 2013 2014 2015 2016 2017 Clustermirroring datacompression Intra-cluster replication 0.7 0.8 0.9 DataProcessing (StreamsAPI) 0.10 DataIntegration (ConnectAPI) 0.11 2018 ExactlyOnce Semantics Performance Improvements KSQLDeveloper Preview Kafka Connect & Streams - the Ecosystem around Kafka 1.0 JBODSupport SupportJava9
  9. 9. Apache Kafka - Unix Analogy $ cat < in.txt | grep "kafka" | tr a-z A-Z > out.txt KafkaConnectAPI KafkaConnectAPIKafkaStreamsAPI KafkaCore(Cluster) Adaptedfrom:Confluent KSQL Kafka Connect & Streams - the Ecosystem around Kafka
  10. 10. Kafka High Level Architecture The who is who Producers write data to brokers. Consumers read data from brokers. All this is distributed. The data Data is stored in topics. Topics are split into partitions, which are replicated. Kafka Cluster Consumer Consumer Consumer Producer Producer Producer Broker 1 Broker 2 Broker 3 Zookeeper Ensemble Kafka Connect & Streams - the Ecosystem around Kafka
  11. 11. Kafka Producer Write Ahead Log / Commit Log Producers always append to tail (append to file, i.e. segment) Order is preserved for messages within same partition Kafka Broker MovementTopic 1 2 3 4 5 Truck 6 6 Kafka Connect & Streams - the Ecosystem around Kafka
  12. 12. Kafka Consumer - Partition offsets Offset A sequential id number assigned to messages in the partitions. Uniquely identifies a message within a partition. Consumers track their pointers via (offset, partition, topic) tuples Since Kafka 0.10: seek to offset by timestamp using method KafkaConsumer#offsetsForTimes ConsumerGroupA ConsumerGroupB 1 2 3 4 5 6 Consumerat "earliest" offset Consumerat "latest" offset Newdata fromProducer Consumerat specificoffset Kafka Connect & Streams - the Ecosystem around Kafka
  13. 13. How to get a Kafka environent Kafka Connect & Streams - the Ecosystem around Kafka On Premises Bare Metal Installation Docker Mesos / Kubernetes Hadoop Distributions Cloud Oracle Event Hub Cloud Service Confluent Cloud
  14. 14. Demo (I) Truck-2 truck position Truck-1 Truck-3 console consumer 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Testdata-GeneratorbyHortonworks Kafka Connect & Streams - the Ecosystem around Kafka
  15. 15. Demo (I) Create Kafka Topic $ kafka-topics --zookeeper zookeeper:2181 --create--topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics --zookeeper zookeeper:2181 list __consumer_offsets _confluent-metrics _schemas docker-connect-configs docker-connect-offsets docker-connect-status truck_position Kafka Connect & Streams - the Ecosystem around Kafka
  16. 16. Demo (I) Run Producer and Kafka-Console-Consumer Kafka Connect & Streams - the Ecosystem around Kafka
  17. 17. Demo (I) Java Producer to "truck_position" Constructing a Kafka Producer private Properties kafkaProps = new Properties(); kafkaProps.put("bootstrap.servers","broker-1:9092); kafkaProps.put("key.serializer", "...StringSerializer"); kafkaProps.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer(kafkaProps); ProducerRecord record = new ProducerRecord("truck_position", driverId, eventData); try { metadata = producer.send(record).get(); } catch (Exception e) {} Kafka Connect & Streams - the Ecosystem around Kafka
  18. 18. Demo (II) devices send to MQTT instead of Kafka Truck-2 truck/nn/ position Truck-1 Truck-3 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka
  19. 19. Demo (II) devices send to MQTT instead of Kafka Kafka Connect & Streams - the Ecosystem around Kafka
  20. 20. Demo (II) - devices send to MQTT instead of Kafka how to get the data into Kafka? Truck-2 truck/nn/ position Truck-1 Truck-3 truck position raw ? 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 Kafka Connect & Streams - the Ecosystem around Kafka
  21. 21. Kafka Connect Kafka Connect & Streams - the Ecosystem around Kafka
  22. 22. Kafka Connect - Overview Source Connector Sink Connector Kafka Connect & Streams - the Ecosystem around Kafka
  23. 23. Kafka Connect Single Message Transforms (SMT) Simple Transformations for a single message Defined as part of Kafka Connect some useful transforms provided out-of-the-box Easily implement your own Optionally deploy 1+ transforms with each connector Modify messages produced by source connector Modify messages sent to sink connectors Makes it much easier to mix and match connectors Some of currently available transforms: InsertField ReplaceField MaskField ValueToKey ExtractField TimestampRouter RegexRouter SetSchemaMetaData Flatten TimestampConverter Kafka Connect & Streams - the Ecosystem around Kafka
  24. 24. Kafka Connect Many Connectors 60+ since first release (0.9+) 20+ from Confluent and Partners Source:http://www.confluent.io/product/connectors ConfluentsupportedConnectors CertifiedConnectors CommunityConnectors Kafka Connect & Streams - the Ecosystem around Kafka
  25. 25. Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 console consumer Kafka Connect & Streams - the Ecosystem around Kafka
  26. 26. Demo (III) Create MQTT Connect through REST API #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors"-H "Content-Type: application/json"-d $'{ "name": "mqtt-source", "config": { "connector.class": "com.datamountaineer.streamreactor.connect.mqtt.source.MqttSourceConnector", "connect.mqtt.connection.timeout": "1000", "tasks.max": "1", "connect.mqtt.kcql": "INSERT INTO truck_position SELECT * FROM truck/+/position", "name": "MqttSourceConnector", "connect.mqtt.service.quality": "0", "connect.mqtt.client.id": "tm-mqtt-connect-01", "connect.mqtt.converter.throw.on.error": "true", "connect.mqtt.hosts": "tcp://mosquitto:1883" } }' Kafka Connect & Streams - the Ecosystem around Kafka
  27. 27. Demo (III) Call REST API and Kafka Console Consumer Kafka Connect & Streams - the Ecosystem around Kafka
  28. 28. Demo (III) Truck-2 truck/nn/ position Truck-1 Truck-3 mqtt to kafka truck_ position 2016-06-0214:39:56.605|98|27|803014426| Wichita toLittle RockRoute2| Normal|38.65|90.21|5187297736652502631 console consumer whataboutsome analytics? Kafka Connect & Streams - the Ecosystem around Kafka
  29. 29. Kafka Streams Kafka Connect & Streams - the Ecosystem around Kafka
  30. 30. Kafka Streams - Overview Designed as a simple and lightweight library in Apache Kafka no external dependencies on systems other than Apache Kafka Part of open source Apache Kafka, introduced in 0.10+ Leverages Kafka as its internal messaging layer Supports fault-tolerant local state Event-at-a-time processing (not microbatch) with millisecond latency Windowing with out-of-order data using a Google DataFlow-like model Kafka Connect & Streams - the Ecosystem around Kafka
  31. 31. Kafka Stream DSL and Processor Topology KStream stream1 = builder.stream("in-1"); KStream stream2= build