噛み砕いてKafka Streams #kafkajp

Click here to load reader

  • date post

    16-Apr-2017
  • Category

    Technology

  • view

    678
  • download

    5

Embed Size (px)

Transcript of 噛み砕いてKafka Streams #kafkajp

  • 20161215

    1

    1

    Kafka Streams

  • @kokumutyoukan

    Kafka, Storm,Cassandra, Elasticsearch

    2

  • Kafka Streams

    3

  • Word Count

    Time, Window, Join

    4

  • Word Count

    Time, Window, Join

    5

  • Kafka Streams is

    Apache Kafka

    0.10.0 20165

    6

    Confluent

  • Kafka Java API

    Kafka Streams7

  • Kafka Java API

    Java jar java

    API

    8

    Consumer consumer = new KafkaConsumer(props);consumer.subscribe(topics);

  • Storm

    9

  • Kafka Streams

    Kafka Streams

    Spark StreamingStormat least once

    OK

    10

    jarjava

  • 11

    KafkaJava API Kafka Streams

  • Word Count

    Time, Window, Join

    12

  • 13

    maven

    API high-level DSL low-level API

    org.apache.kafkakafka-streams0.10.0.1

  • 14

    @Testpublic final void wordCount() {

    KStreamBuilder builder = new KStreamBuilder();

    KStream queryStream= builder.stream(stringSerde, stringSerde, search-query-topic); //

    KStream wordCounts = queryStream.flatMapValues(value -> Arrays.asList(value.split(s+))) // .map((key, value) -> new KeyValue(value, value)) // key .countByKey(stringSerde, Counts) // KStream -> KTable.toStream(); // KTable -> KStream

    wordCounts.to(stringSerde, longSerde, wordcount-output); // sink

    KafkaStreams streams = new KafkaStreams(builder, props); // props Kafka Streams streams.start(); //

    }

  • 15

    // producer.send(new ProducerRecord(search-query-topic, ));producer.send(new ProducerRecord(search-query-topic, ));producer.send(new ProducerRecord(search-query-topic, "));

    consumer.subscribe(Arrays.asList("wordcount-output"));while (true) {

    ConsumerRecords records = consumer.poll(100);for (ConsumerRecord record : records) {

    System.out.println("record = " + record.key() + ", " + record.value());}

    }// record = , 1record = , 1record = , 1record = , 2record = , 1record = , 2

    Kafka Unit TestKafka FAQ

  • KStream? KTable?

    KStream

    record streamKStream

    PV

    KTable

    changelog streamKTable

    key

    State

    16

  • Word Count

    Time, Window, Join

    17

  • Time

    1.

    2. APIKafka

    3. Kafka

    4. Kafka Streams

    18

    Tweet!

    TwitterAPI

    my BEserver

    Kafka Streams

  • Time

    1.

    2. APIKafka

    3. Kafka

    4. Kafka Streams

    1

    Kafka Streams timestamp.extractor

    19

  • Kafka Streams event-time

    Kafka message APIKafka

    broker log.message.timestamp.type=CreateTime

    Kafka0.10Message 0.9producer -1

    ingestion-time Kafka

    log.message.timestamp.type=LogAppendTime

    Kafka Brokermessage

    processing-time Kafka Streams

    20

  • timestamp.extractor

    21

    Time timestamp.extractor

    event-time

    event-timemessage ConsumerRecordTimestampExtractor

    ingestion-time ConsumerRecordTimestampExtractor

    processing-time WallclockTimestampExtractor

    import java.util.Properties;import org.apache.kafka.streams.StreamsConfig;

    Properties props = new Properties();props.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG,

    WallclockTimestampExtractor.class.getName());

  • 22

    import org.apache.kafka.clients.consumer.ConsumerRecord;import org.apache.kafka.streams.processor.TimestampExtractor;

    // TimestampExtractorpublic class MyEventTimeExtractor implements TimestampExtractor {@Override public long extract(ConsumerRecord record) {

    // timestampFoo myPojo = (Foo) record.value();if (myPojo != null) {

    return myPojo.getTimestampInMillis();} else {// valuenullreturn System.currentTimeMillis();}

    }}

    http://docs.confluent.io/3.0.0/streams/developer-guide.html#timestamp-extractor

  • Window

    23

    Tumbling time window 5PV

    Hopping time window 1

    KStream viewsByUser = IDkeyPVStream;KTable userCounts =

    viewsByUser.countByKey(TimeWindows.of(WindowName", 5 * 60 * 1000L));

    TimeWindows.of(WindowName", 5 * 60 * 1000L).advanceBy(60 * 1000L);

  • Join

    24

    Join

    KTable KVSRDB

    KStream voteRegionStream = ...(vote-topic)KTable partyTable = ...("party-topic");

    KStream voteParty= voteRegionStream.leftJoin(

    partyTable, (region, party) -> region + ," + party);

    k: Hillary v: California k: candidate v: party

    Hillary Democratic

    Trump Republican

    k: Hillary v: California, Democratic

  • Word Count

    Time, Window, Join

    25

  • Kafka Streams

    Kafkasink

    Kafka-4160 () Kafka Streams

    consumer

    1topic60

    26

  • Kafka 0.10.1.0 client, server 0.10.0.1

    Kafka Streams

    Apps built with Kafka Streams 0.10.1 only work against Kafka clusters running 0.10.1+.

    0.10.0.1Kafka

    0.10.1

    27

    | |

    /. | Exception | |.... | Use 0.10.1 ! | | |.... | . | |.... | | _ .|

    ( )(_,

    |||||

  • Kafka Streams

    Kafka Streams

    KafkaAPI

    Kafka Streams API

    1 Kafka

    28

  • Word Count

    Time, Window, Join

    29

  • Kafka Streams

    Kafka Streams

    30

  • Appendix

    31

  • Kafka Streams

    32

    consumer

    KafkaJava

    Kafka

    Kafka

    source topic

    internal topic

    sink topic

    Kafka

    Streams

    Kafka

    Connect

    Kafka

    Connect

  • Configuration

    33

    import java.util.Properties;

    import org.apache.kafka.streams.StreamsConfig;

    import org.apache.kafka.clients.producer.ProducerConfig;

    import org.apache.kafka.clients.producer.ConsumerConfig;

    Properties settings = new Properties();

    settings.put(StreamsConfig.APPLICATION_ID_CONFIG, my-app); // StreamConfig3settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, localhost:9092");

    settings.put(StreamsConfig.ZOOKEEPER_CONNECT_CONFIG, localhost:2181");

    settings.put(ProducerConfig...., ); // settings.put(ConsumerConfig...., ); //

    application.id . consumer groupinternal topic.

    bootstrap.servers Kafkahost/port.

    zookeeper.connect ZooKeeperhost:port/chroot.

    num.stream.threads .

    replication.factor internal topic

    state.dir State Store

    timestamp.extractor

  • StateKafkachangelog topictopic

    34

    Node

    Task

    source part-1

    changelog part-1

    Node

    Task

    source part-0

    changelog part-0

  • StateKafkachangelog topictopic

    35

    Node

    Task

    source part-1

    changelog part-1

    Node

    Tasksource part-0

    changelog part-0Task

  • changelog topic topicKafka Streams

    topicKafkaauto.create.topics.enable=false

    topiccompact keyvalue

    36

  • full

    37

    @Testpublic final void wordCount() {

    final Serde stringSerde = Serdes.String(); // Serde is Serializer/DeserializerKafkafinal Serde longSerde = Serdes.Long(); // Serdes

    KStreamBuilder builder = new KStreamBuilder();// KStream. 1: key Serde, 2: value Serde, 3: KStream queryStream = builder.stream(stringSerde, stringSerde, search-query-topic);

    KStream wordCounts = queryStream// value.flatMapValues(value -> Arrays.asList(value.split(s+)))// keykeyvalue.map((key, value) -> new KeyValue(value, value)).countByKey(stringSerde, Counts) // KStream -> KTableKTable.toStream(); // KTable -> KStreamwordCounts.to(stringSerde, longSerde, wordcount-output); // sink

    KafkaStreams streams = new KafkaStreams(builder, props); // propsKafka StreamsClientPropertiesstreams.start(); //

    }

  • full

    38

    KafkaJava API

    ?

    Kafka Streams