Introduction to Kafka Streams
date post
16-Apr-2017Category
Engineering
view
14.671download
1
Embed Size (px)
Transcript of Introduction to Kafka Streams
Kafka Streams Stream processing Made Simple with Kafka
1
Guozhang Wang Hadoop Summit, June 28, 2016
2
What is NOT Stream Processing?
3
Stream Processing isnt (necessarily)
Transient, approximate, lossy
.. that you must have batch processing as safety net
4
5
6
7
8
Stream Processing
A different programming paradigm
.. that brings computation to unbounded data
.. with tradeoffs between latency / cost / correctness
9
Why Kafka in Stream Processing?
10
Persistent Buffering
Logical Ordering
Scalable source-of-truth
Kafka: Real-time Platforms
11
Stream Processing with Kafka
12
Option I: Do It Yourself !
Stream Processing with Kafka
13
Option I: Do It Yourself !
Stream Processing with Kafka
while (isRunning) { // read some messages from Kafka inputMessages = consumer.poll();
// do some processing
// send output messages back to Kafka producer.send(outputMessages); }
14
15
Ordering
Partitioning &
Scalability
Fault tolerance
DIY Stream Processing is Hard
State Management
Time, Window &
Out-of-order Data
Re-processing
16
Option I: Do It Yourself !
Option II: full-fledged stream processing system
Storm, Spark, Flink, Samza, ..
Stream Processing with Kafka
17
MapReduce Heritage?
Config Management
Resource Management
Configuration
etc..
18
MapReduce Heritage?
Config Management
Resource Management
Deployment
etc..
19
MapReduce Heritage?
Config Management
Resource Management
Deployment
etc..
Can I just use my own?!
20
Option I: Do It Yourself !
Option II: full-fledged stream processing system
Option III: lightweight stream processing library
Stream Processing with Kafka
Kafka Streams
In Apache Kafka since v0.10, May 2016
Powerful yet easy-to-use stream processing library Event-at-a-time, Stateful
Windowing with out-of-order handling
Highly scalable, distributed, fault tolerant
and more..21
22
Anywhere, anytime
Ok. Ok. Ok. Ok.
23
Anywhere, anytime
org.apache.kafka kafka-streams 0.10.0.0
24
Anywhere, anytime
War F
ileRsy
ncPu
ppet/
Chef
YARN Me
sosDo
cker
Kuber
netes
Very Uncool Very Cool
25
Simple is Beautiful
Kafka Streams DSL
26
public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);
// count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);
// write the result table to a new topic counts.to(topic2);
// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
Kafka Streams DSL
27
public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);
// count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);
// write the result table to a new topic counts.to(topic2);
// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
Kafka Streams DSL
28
public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);
// count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);
// write the result table to a new topic counts.to(topic2);
// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
Kafka Streams DSL
29
public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);
// count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);
// write the result table to a new topic counts.to(topic2);
// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
Kafka Streams DSL
30
public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);
// count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);
// write the result table to a new topic counts.to(topic2);
// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
Kafka Streams DSL
31
public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);
// count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);
// write the result table to a new topic counts.to(topic2);
// create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
32
Native Kafka IntegrationProperty cfg = new Properties();
cfg.put(StreamsConfig.APPLICATION_ID_CONFIG, my-streams-app);
cfg.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, broker1:9092);
cfg.put(ConsumerConfig.AUTO_OFFSET_RESET_CONIFG, earliest);
cfg.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, SASL_SSL);
cfg.put(KafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, registry:8081);
StreamsConfig config = new StreamsConfig(cfg);
KafkaStreams streams = new KafkaStreams(builder, config);
33
Property cfg = new Properties();
cfg.put(StreamsConfig.APPLICATION_ID_CONFIG, my-streams-app);
cfg.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, broker1:9092);
cfg.put(ConsumerConfig.AUTO_OFFSET_RESET_CONIFG, earliest);
cfg.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, SASL_SSL);
cfg.put(KafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, registry:8081);
StreamsConfig config = new StreamsConfig(cfg);
KafkaStreams streams = new KafkaStreams(builder, config);
Native Kafka Integration
34
API, coding
Full stack evaluation
Operations, debugging,
35
API, coding
Full stack evaluation
Operations, debugging,
Simple is Beautiful
36
Key Idea:
Outsource hard problems to Kafka!
Kafka Concepts: the Log
4 5 5 7 8 9 10 11 12...
Producer Write
Consumer1 Reads (offset 7)
Consumer2 Reads (offset 10)
Messages
3
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
Kafka Concepts: the Log
39
Kafka Streams: Key Concepts
Stream and Records
40
Key Value Key Value Key Value Key Value
Stream
Record
Processor Topology
41
Stream
Processor Topology
42
StreamProcessor
Processor Topology
43
KStream stream1 = builder.stream(topic1);
KStream stream2 = builder.stream(topic2);
KStream joined = stream1.leftJoin(stream2, ...);
KTable aggregated = joined.aggregateByKey(...);
aggregated.to(topic3);
Processor Topology
44
KStream stream1 = builder.stream(topic1);
KStream stream2 = builder.stream(topic2);
KStream joined = stream1.leftJoin(stream2, ...);
KTable aggregated = joined.aggregateByKey(...);
aggregated.to(topic3);
Processor Topology
45
KStream stream1 = builder.stream(topic1);
KStream stream2 = builder.stream(topic2);
KStream joined = stream1.leftJoin(stream2, ...);
KTable aggregated = joined.aggregateByKey(...);
aggregated.to(topic3);
Processor Topology
46
KStream stream1 = builder.stream(topic1);
KStream stream2 = builder.stream(topic2);
KStream joined = stream1.leftJoin(stream2, ...);
KTable aggregated = joined.aggregateByKey(...);
aggregated.to(topic3);
Processor Topology
47
KStream stream1 = builder.stream(topic1);
KStream stream2 = builder.stream(topic2);
KStream joined