Kafka connect-london-meetup-2016

29
Stream All Things Real-time Data Integration at Scale with Apache Kafka By Gwen Shapira

Transcript of Kafka connect-london-meetup-2016

Page 1: Kafka connect-london-meetup-2016

Stream All ThingsReal-time Data Integration at Scale with Apache Kafka

By Gwen Shapira

Page 2: Kafka connect-london-meetup-2016
Page 3: Kafka connect-london-meetup-2016
Page 4: Kafka connect-london-meetup-2016
Page 5: Kafka connect-london-meetup-2016

Hadoop Cluster II

Storage Processing

SolR

Hadoop Cluster I

ClientClientFlume Agents

Hbase / Memory

Spark Streaming

HDFS

Hive/Impala

Map/Reduce

Spark

Search

Automated & Manual

Analytical Adjustments and Pattern detection

Fetching & Updating Profiles

Adjusting NRT Stats

HDFSEventSink

SolR Sink

Batch Time Adjustments

Automated & Manual

Review of NRT Changes and

Counters

Local Cache

Kafka

Clients:(Swipe here!)

Web App

Page 6: Kafka connect-london-meetup-2016

Data Integrationgetting data to all the right places

Page 7: Kafka connect-london-meetup-2016
Page 8: Kafka connect-london-meetup-2016
Page 9: Kafka connect-london-meetup-2016
Page 10: Kafka connect-london-meetup-2016
Page 11: Kafka connect-london-meetup-2016
Page 12: Kafka connect-london-meetup-2016
Page 13: Kafka connect-london-meetup-2016

IntroducingKafka ConnectLarge-scale streaming data import/export for Kafka

Page 14: Kafka connect-london-meetup-2016
Page 15: Kafka connect-london-meetup-2016
Page 16: Kafka connect-london-meetup-2016
Page 17: Kafka connect-london-meetup-2016
Page 18: Kafka connect-london-meetup-2016
Page 19: Kafka connect-london-meetup-2016
Page 20: Kafka connect-london-meetup-2016
Page 21: Kafka connect-london-meetup-2016
Page 22: Kafka connect-london-meetup-2016
Page 23: Kafka connect-london-meetup-2016

Offsets automatically committed and restored

On restart: task checks offsets & rewinds

At least once delivery – flush data, then commit

Exactly once for connectors that support it (e.g. HDFS)

Delivery Guarantees

Page 24: Kafka connect-london-meetup-2016

Abstract serialization: 1 connector, many serialization formats

Convert between Kafka Connect Data API (Connectors) and serialized bytes (Kafka)

JSON and Avro are currently well supported

Converters

Page 25: Kafka connect-london-meetup-2016
Page 26: Kafka connect-london-meetup-2016

Confluent Open Source – HDFS, JDBC

Connector Hub: connectors.confluent.io

Examples: MySQL, MongoDB, Twitter, Solr, S3, MQTT, Bloomberg, Apache Ignite, Attunity, Couchbase, Vertica, Cassandra, Hbase, Kudu, Mixpanel, Systlog, Twitter and more

Connectors Today

Page 27: Kafka connect-london-meetup-2016

Jenkins connector – Aravind Yarram (Equifax)

Twitter semantic analysis and visualization – Ashish Singh (Cloudera)

Brain monitoring device connector – Silicon Valley Data Science

DynamoDB, Cassandra, Slack, Splunk, and many more

Connectors from the Hackathon

Page 28: Kafka connect-london-meetup-2016

Improved connector control via REST API, standardized configs, metrics

Single record transformations

Data pipelines in an app - embedded mode & Kafka Streams integration

Many more connectors

Coming soon…

Page 29: Kafka connect-london-meetup-2016

THANK YOU!Gwen Shapira | [email protected] | @gwenshap

Visit us in the Confluent Booth (#217)

Kafka: The Definitive Guide = Book Giveaway and Signing

Making Sense of Stream Processing = Book Giveaway

Kafka Training with Confluent University

Kafka Developer and Operations Courses

Visit www.confluent.io/training

Want more Kafka?

Download Confluent Platform Enterprise at http://www.confluent.io/product

Apache Kafka 0.10 upgrade documentation at http://docs.confluent.io/3.0.0/upgrade.html

Kafka Summit recordings now available at http://kafka-summit.org/schedule/