London Apache Kafka Meetup (Jan 2017)

Delivering Fast Data Systems with Kafka

LANDOOPwww.landoop.com

Antonios Chalkiopoulos 18/1/2017

@chalkiopoulos

Open Source contributor

Big Data projects in Media, Betting, Retail and Investment Banks in London Books Author, Programming MapReduce with Scalding Founder of Landoop

DevOps Big Data Scala

Automation Distributed Systems Monitoring

Hadoop Fast Data / Streams Kafka

KAFKA CONNECT

a bit of context

KAFKA CONNECT

“a common framework for allowing stream data flow

between kafka and other systems”

Data is produced from a source and consumed to a sink.

Data SourceKa

KAFKA Data SinkData SourceKa

KAFKA Data Sink

Stream processing

Data SourceKa

KAFKA Data Sink

Stream processing

Developers don’t care about:

Move data to/from sink/source Support delivery semantics Offset Management Serialization / de-serialization Partitioning / Scalability Fault tolerance / fail-over Schema Registry integration

Developers care about:

Domain specific transformations

CONNECTORS

Kafka Connect’s framework allows developers to create connectors that copy data to/from other systems just by writing configuration files and

submitting them to Connect with no code necessary

Connector configurations are key-value mappings

name connector’s unique name

connector.class connector’s java class

tasks.max maximum tasks to create

topics list of topics (to source or sink data)

Introducing a query language for the connectors

name connector’s unique name

connector.class connector’s java class

tasks.max maximum tasks to create

topics list of topics (to source or sink data)

query KCQL query specifies fields/actions for the target system

KCQLKafka Connect Query Language

is a SQL like syntax allowing streamlined configuration of Kafka Sink Connectors and then some more..

Example:

Project fields, rename or ignore them and further customise in plain text

INSERT INTO transactions SELECT field1 AS column1, field2 AS column2, field3 FROM TransactionTopic; INSERT INTO audits SELECT * FROM AuditsTopic; INSERT INTO logs SELECT * FROM LogsTopic AUTOEVOLVE; INSERT INTO invoices SELECT * FROM InvoiceTopic PK invoiceID;

So while integrating Kafka with in-memory data grid, key-value, document stores, NoSQL, search etc systems..

INSERT INTO $TARGET SELECT *|columns(i.e col1,col2 | col1 AS column1,col2) FROM $TOPIC_NAME [ IGNORE columns ] [ AUTOCREATE ] [ PK columns ] [ AUTOEVOLVE ] [ BATCH = N ] [ CAPITALIZE ] [ INITIALIZE ] [ PARTITIONBY cola[,colb] ] [ DISTRIBUTEBY cola[,colb] ] [ CLUSTERBY cola[,colb] ] [ TIMESTAMP cola|sys_current ] [ STOREAS $YOUR_TYPE([key=value, .....]) ] [ WITHFORMAT TEXT|AVRO|JSON|BINARY|OBJECT|MAP ]

KCQLHow does it look like?

Topic to target mapping Field selection Auto creation Auto evolution Error policies Multiple KCQLs / topic

- Field extraction - Access to Key & Metadata

Why KCQL ?

KCQLAdvanced Features Examples

KCQL |

{ "sensor_id": "01" , "temperature": 52.7943, "ts": 1484648810 } { “sensor_id": "02" , "temperature": 28.8597, "ts": 1484648810 }

Example Kafka topic with IoT data

INSERT INTO sensor_ringbuffer SELECT sensor_id, temperature, ts FROM coap_sensor_topic WITHFORMAT JSON STOREAS RING_BUFFER

INSERT INTO sensor_reliabletopic SELECT sensor_id, temperature, ts FROM coap_sensor_topic WITHFORMAT AVRO STOREAS RELIABLE_TOPIC

INSERT INTO FXSortedSet SELECT symbol, price FROM yahooFX-topic STOREAS SortedSet(score=ts)

SELECT price FROM yahooFX-topic PK symbol STOREAS SortedSet(score=ts)

KCQL |

{ "symbol": "USDGBP" , "price": 0.7943, "ts": 1484648810 } { "symbol": "EURGBP" , "price": 0.8597, "ts": 1484648810 }

Example Kafka topic with FX data

B:1 A:2 D:3 C:20

Sorted Set -> { value : score }

Stream reactor connectors support KCQL

kafka-connect-blockchain kafka-connect-bloomberg kafka-connect-cassandra kafka-connect-coap kafka-connect-druid kafka-connect-elastic kafka-connect-ftp kafka-connect-hazelcast kafka-connect-hbase

kafka-connect-influxdb kafka-connect-jms kafka-connect-kudu kafka-connect-mongodb kafka-connect-mqtt kafka-connect-redis kafka-connect-rethink kafka-connect-voltdb kafka-connect-yahoo

Source: https://github.com/datamountaineer/stream-reactor Integration Tests: http://coyote.landoop.com/connect/

DEMOKafka Connect InfluxDB

We ‘ll need: • Zookeeper • Kafka Broker • Schema Registry • Kafka Connect Distributed • Kafka REST Proxy

We ‘ll also use: • StreamReactor connectors • Landoop Fast Data Web Tools

docker run --rm -it \ -p 2181:2181 -p 3030:3030 -p 8081:8081 \ -p 8082:8082 -p 8083:8083 -p 9092:9092 \ -e ADV_HOST=192.168.99.100 \ landoop/fast-data-dev

case class DeviceMeasurements( deviceId: Int,

temperature: Int,moreData: String,timestamp: Long)

We’ll generate some Avro messages

DEMOKafka Development Environment @ Fast-data-dev docker image

https://hub.docker.com/r/landoop/fast-data-dev/

DEMOIntegration testing with Coyote for connectors & infrastructure

https://github.com/Landoop/coyote

Schema Registry UIhttps://github.com/Landoop/schema-registry-ui

Kafka Topics UIhttps://github.com/Landoop/kafka-topics-ui

Kafka Connect UIhttps://github.com/Landoop/kafka-connect-ui

Connectors Performance

Monitoring & Alerting

via JMX

Deployment apps

Containers mesos -kubernetes

Hadoop integration

* state-less apps = container-friendly schema registry, kafka connect

How do I IT?

Available features: Kafka ecosystem StreamReactor Connectors Landoop web tools Monitoring & Alerting Security features

Wrap up

- KCQL

- Connectors

- Kafka Web Tools

- Automation & Integrations

Coming up

- Kafka backendenhanced UIs | Timetravel

$ locate

https://github.com/Landoop

https://hub.docker.com/r/landoop/

https://github.com/datamountaineer/stream-reactor

http://www.landoop.com

Thank you ;)

London Apache Kafka Meetup (Jan 2017)

Technology

Transcript of London Apache Kafka Meetup (Jan 2017)

Apache Kafka - Free Friday

Using Apache Spark, Apache Kafka and Apache Cassandra...USING APACHE SPARK, APACHE KAFKA AND APACHE CASSANDRA TO POWER INTELLIGENT APPLICATIONS | 02 Apache Cassandra is well known

Net flix kafka seattle meetup

Apache Kafka - · PDF fileOverview What is Apache Kafka? Data pipelines Architecture How does Apache Kafka work? Brokers Producers Consumers Topics

apache kafka event stream processing solution is Apache Kafka? Apache Kafka is a Stream Processing Element (SPE) taking care of the needs of event processing. Apache Kafka was initially

Dublin Apache Kafka Meetup, 30 August 2017 …...© 2017 Mesosphere, Inc. All Rights Reserved. 1 Dublin Apache Kafka Meetup, 30 August 2017The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*,

Apache Kafka Security

Apache Kafka at LinkedIn

Spring for Apache Kafka · 2020. 8. 12. · 4.1. Using Spring for Apache Kafka . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Paris Kafka Meetup - Concepts & Architecture

Kafka Tutorial - Introduction to Apache Kafka (Part 1)

Apache Kafka Lesson Learned

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE … SF... · STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & ... One of the initial authors of Apache Kafka, ... Introduction to

Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka

Apache Kafka - Martin Podval

Apache Kafka Bay Area Sep Meetup - 24/7 Customer, Inc.

· Apache Kafka Introduction to Apache Kafka Apache Kafka Architecture explanation Practical Examples on Apache Kafka SCALA, PYTHON, SPARK Course Content

Paris Kafka Meetup - How to develop with Kafka

Kafka blr-meetup-presentation - Kafka internals

Kafka & Hadoop - for NYC Kafka Meetup

Dublin Apache Kafka Meetup, 30 August 2017 …...© 2017 Mesosphere, Inc. All Rights Reserved. 1 Dublin Apache Kafka Meetup, 30 August 2017The SMACK Stack: Spark, Mesos, Akka, Cassandra*,