Introduction to Kafka Streams

Click here to load reader

  • date post

    16-Apr-2017
  • Category

    Engineering

  • view

    14.671
  • download

    1

Embed Size (px)

Transcript of Introduction to Kafka Streams

  • Kafka Streams Stream processing Made Simple with Kafka

    1

    Guozhang Wang Hadoop Summit, June 28, 2016

  • 2

    What is NOT Stream Processing?

  • 3

    Stream Processing isnt (necessarily)

    Transient, approximate, lossy

    .. that you must have batch processing as safety net

  • 4

  • 5

  • 6

  • 7

  • 8

    Stream Processing

    A different programming paradigm

    .. that brings computation to unbounded data

    .. with tradeoffs between latency / cost / correctness

  • 9

    Why Kafka in Stream Processing?

  • 10

    Persistent Buffering

    Logical Ordering

    Scalable source-of-truth

    Kafka: Real-time Platforms

  • 11

    Stream Processing with Kafka

  • 12

    Option I: Do It Yourself !

    Stream Processing with Kafka

  • 13

    Option I: Do It Yourself !

    Stream Processing with Kafka

    while (isRunning) { // read some messages from Kafka inputMessages = consumer.poll();

    // do some processing

    // send output messages back to Kafka producer.send(outputMessages); }

  • 14

  • 15

    Ordering

    Partitioning &

    Scalability

    Fault tolerance

    DIY Stream Processing is Hard

    State Management

    Time, Window &

    Out-of-order Data

    Re-processing

  • 16

    Option I: Do It Yourself !

    Option II: full-fledged stream processing system

    Storm, Spark, Flink, Samza, ..

    Stream Processing with Kafka

  • 17

    MapReduce Heritage?

    Config Management

    Resource Management

    Configuration

    etc..

  • 18

    MapReduce Heritage?

    Config Management

    Resource Management

    Deployment

    etc..

  • 19

    MapReduce Heritage?

    Config Management

    Resource Management

    Deployment

    etc..

    Can I just use my own?!

  • 20

    Option I: Do It Yourself !

    Option II: full-fledged stream processing system

    Option III: lightweight stream processing library

    Stream Processing with Kafka

  • Kafka Streams

    In Apache Kafka since v0.10, May 2016

    Powerful yet easy-to-use stream processing library Event-at-a-time, Stateful

    Windowing with out-of-order handling

    Highly scalable, distributed, fault tolerant

    and more..21

  • 22

    Anywhere, anytime

    Ok. Ok. Ok. Ok.

  • 23

    Anywhere, anytime

    org.apache.kafka kafka-streams 0.10.0.0

  • 24

    Anywhere, anytime

    War F

    ileRsy

    ncPu

    ppet/

    Chef

    YARN Me

    sosDo

    cker

    Kuber

    netes

    Very Uncool Very Cool

  • 25

    Simple is Beautiful

  • Kafka Streams DSL

    26

    public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);

    // count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);

    // write the result table to a new topic counts.to(topic2);

    // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

  • Kafka Streams DSL

    27

    public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);

    // count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);

    // write the result table to a new topic counts.to(topic2);

    // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

  • Kafka Streams DSL

    28

    public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);

    // count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);

    // write the result table to a new topic counts.to(topic2);

    // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

  • Kafka Streams DSL

    29

    public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);

    // count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);

    // write the result table to a new topic counts.to(topic2);

    // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

  • Kafka Streams DSL

    30

    public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);

    // count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);

    // write the result table to a new topic counts.to(topic2);

    // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

  • Kafka Streams DSL

    31

    public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream words = builder.stream(topic1);

    // count the words in this stream as an aggregated table KTable counts = words.countByKey(Counts);

    // write the result table to a new topic counts.to(topic2);

    // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }

  • 32

    Native Kafka IntegrationProperty cfg = new Properties();

    cfg.put(StreamsConfig.APPLICATION_ID_CONFIG, my-streams-app);

    cfg.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, broker1:9092);

    cfg.put(ConsumerConfig.AUTO_OFFSET_RESET_CONIFG, earliest);

    cfg.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, SASL_SSL);

    cfg.put(KafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, registry:8081);

    StreamsConfig config = new StreamsConfig(cfg);

    KafkaStreams streams = new KafkaStreams(builder, config);

  • 33

    Property cfg = new Properties();

    cfg.put(StreamsConfig.APPLICATION_ID_CONFIG, my-streams-app);

    cfg.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, broker1:9092);

    cfg.put(ConsumerConfig.AUTO_OFFSET_RESET_CONIFG, earliest);

    cfg.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, SASL_SSL);

    cfg.put(KafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, registry:8081);

    StreamsConfig config = new StreamsConfig(cfg);

    KafkaStreams streams = new KafkaStreams(builder, config);

    Native Kafka Integration

  • 34

    API, coding

    Full stack evaluation

    Operations, debugging,

  • 35

    API, coding

    Full stack evaluation

    Operations, debugging,

    Simple is Beautiful

  • 36

    Key Idea:

    Outsource hard problems to Kafka!

  • Kafka Concepts: the Log

    4 5 5 7 8 9 10 11 12...

    Producer Write

    Consumer1 Reads (offset 7)

    Consumer2 Reads (offset 10)

    Messages

    3

  • Topic 1

    Topic 2

    Partitions

    Producers

    Producers

    Consumers

    Consumers

    Brokers

    Kafka Concepts: the Log

  • 39

    Kafka Streams: Key Concepts

  • Stream and Records

    40

    Key Value Key Value Key Value Key Value

    Stream

    Record

  • Processor Topology

    41

    Stream

  • Processor Topology

    42

    StreamProcessor

  • Processor Topology

    43

    KStream stream1 = builder.stream(topic1);

    KStream stream2 = builder.stream(topic2);

    KStream joined = stream1.leftJoin(stream2, ...);

    KTable aggregated = joined.aggregateByKey(...);

    aggregated.to(topic3);

  • Processor Topology

    44

    KStream stream1 = builder.stream(topic1);

    KStream stream2 = builder.stream(topic2);

    KStream joined = stream1.leftJoin(stream2, ...);

    KTable aggregated = joined.aggregateByKey(...);

    aggregated.to(topic3);

  • Processor Topology

    45

    KStream stream1 = builder.stream(topic1);

    KStream stream2 = builder.stream(topic2);

    KStream joined = stream1.leftJoin(stream2, ...);

    KTable aggregated = joined.aggregateByKey(...);

    aggregated.to(topic3);

  • Processor Topology

    46

    KStream stream1 = builder.stream(topic1);

    KStream stream2 = builder.stream(topic2);

    KStream joined = stream1.leftJoin(stream2, ...);

    KTable aggregated = joined.aggregateByKey(...);

    aggregated.to(topic3);

  • Processor Topology

    47

    KStream stream1 = builder.stream(topic1);

    KStream stream2 = builder.stream(topic2);

    KStream joined