Introducing Kafka's Streams API

Click here to load reader

  • date post

    12-Apr-2017
  • Category

    Software

  • view

    329
  • download

    1

Embed Size (px)

Transcript of Introducing Kafka's Streams API

  • 1Confidential

    Introducing Kafkas Streams APIStream processing made simple

    Target audience: technical staff, developers, architectsExpected duration for full deck: 45 minutes

  • 2Confidential

    0.10 Data processing (Streams API)

    0.9 Data integration (Connect API)

    Intra-clusterreplication

    0.8

    Apache Kafka: birthed as a messaging system, now a streaming platform

    2012 2014 2015 2016 2017

    Cluster mirroring,data compression

    0.7

    2013

  • 3Confidential

    Kafkas Streams API: the easiest way to process data in Apache Kafka

    Key Benefits of Apache Kafkas Streams API Build Apps, Not Clusters: no additional cluster required Cluster to go: elastic, scalable, distributed, fault-tolerant, secure Database to go: tables, local state, interactive queries Equally viable for S / M / L / XL / XXL use cases Runs Everywhere: integrates with your existing deployment

    strategies such as containers, automation, cloud

    Part of open source Apache Kafka, introduced in 0.10+ Powerful client library to build stream processing apps Apps are standard Java applications that run on client

    machines https://github.com/apache/kafka/tree/trunk/streams

    Streams API

    Your App

    KafkaCluster

  • 4Confidential

    Kafkas Streams API: Unix analogy

    $ cat < in.txt | grep apache | tr a-z A-Z > out.txt

    Kafka Cluster

    Connect API Streams API

  • 5Confidential

    Streams API in the context of Kafka

    Streams API

    Your App

    KafkaCluster

    Conn

    ect A

    PI

    Conn

    ect A

    PI

    Oth

    er S

    yste

    ms

    Oth

    er S

    yste

    ms

  • 6Confidential

    When to use Kafkas Streams API

    Mainstream Application Development To build core business applications Microservices Fast Data apps for small and big data Reactive applications Continuous queries and transformations Event-triggered processes The T in ETL

    Use case examples Real-time monitoring and intelligence Customer 360-degree view Fraud detection Location-based marketing Fleet management

  • 7Confidential

    Some public use cases in the wild & external articles

    Applying Kafkas Streams API for internal message delivery pipeline at LINE Corp. http://developers.linecorp.com/blog/?p=3960 Kafka Streams in production at LINE, a social platform based in Japan with 220+ million users

    Microservices and reactive applications at Capital One https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams

    User behavior analysis https://timothyrenner.github.io/engineering/2016/08/11/kafka-streams-not-looking-at-facebook.html

    Containerized Kafka Streams applications in Scala https://www.madewithtea.com/processing-tweets-with-kafka-streams.html

    Geo-spatial data analysis http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/

    Language classification with machine learning https://dzone.com/articles/machine-learning-with-kafka-streams

  • 8Confidential

    Do more with less

  • 9Confidential

    Architecture comparison: use case example

    Real-time dashboard for security monitoringWhich of my data centers are under attack?

  • 10Confidential

    Architecture comparison: use case example

    Other App

    Dashboard Frontend

    AppOther App

    1 Capture businessevents in Kafka

    2 Must process events withseparate cluster (e.g. Spark)

    4 Other apps access latest resultsby querying these DBs3Must share latest results throughseparate systems (e.g. MySQL)

    Before: Undue complexity, heavy footprint, many technologies, split ownership with conflicting priorities

    Your Job

    Other App

    Dashboard Frontend

    AppOther App

    1 Capture businessevents in Kafka

    2 Process events with standardJava apps that use Kafka Streams

    3 Now other apps can directlyquery the latest results

    With Kafka Streams: simplified, app-centric architecture, puts app owners in control

    KafkaStreams

    Your App

  • 11Confidential

  • 12Confidential

  • 13Confidential

    How do I install the Streams API?

    There is and there should be no installation Build Apps, Not Clusters! Its a library. Add it to your app like any other library.

    org.apache.kafkakafka-streams0.10.1.1

  • 14Confidential

    But wait a minute wheres THE CLUSTER to process the data?

    No cluster needed Build Apps, Not Clusters! Unlearn bad habits: do cool stuff with data must have cluster

    Ok. Ok. Ok.

  • 15Confidential

    Organizational benefits: decouple teams and roadmaps, scale people

  • 16Confidential

    Organizational benefits: decouple teams and roadmaps, scale people

    Infrastructure Team(Kafka as a shared, multi-tenant service)

    Fraud detection

    app

    Payments team

    Recommendations app

    Mobile team

    Securityalertsapp

    Operations team

    ...more apps...

    ...

  • 17Confidential

    How do I package, deploy, monitor my apps? How do I ?

    Whatever works for you. Stick to what you/your company think is the best way. No magic needed. Why? Because an app that uses the Streams API isa normal Java app.

  • 18Confidential

    Available APIs

  • 19Confidential

    The API is but the tip of the iceberg

    API, coding

    Org. processes

    RealityDeployment

    OperationsSecurity

    Architecture

    Debugging

  • 20Confidential

    API option 1: DSL (declarative)

    KStream input =builder.stream("numbers-topic");

    // Stateless computationKStream doubled =

    input.mapValues(v -> v * 2);

    // Stateful computationKTable sumOfOdds = input

    .filter((k,v) -> v % 2 != 0)

    .selectKey((k, v) -> 1)

    .groupByKey()

    .reduce((v1, v2) -> v1 + v2, "sum-of-odds");

    The preferred API for most use cases.

    Particularly appeals to:

    Fans of Scala, functional programming

    Users familiar with e.g. Spark

  • 21Confidential

    API option 2: Processor API (imperative)

    class PrintToConsoleProcessorimplements Processor {

    @Overridepublic void init(ProcessorContext context) {}

    @Overridevoid process(K key, V value) {

    System.out.println("Got value " + value); }

    @Overridevoid punctuate(long timestamp) {}

    @Overridevoid close() {}

    }

    Full flexibility but more manual work

    Appeals to:

    Users who require functionality that is

    not yet available in the DSL

    Users familiar with e.g. Storm, Samza

    Still, check out the DSL!

  • 22Confidential

    When to use Kafka Streams vs. Kafkas normal consumer clients

    Kafka Streams

    Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time

    Kafka consumer clients (Java, C/C++, Python, Go, )

    When you must interact with Kafka at a very low level and/or in a very special way Example: When integrating your own stream

    processing tool (Spark, Storm) with Kafka.

  • 23Confidential

    Code comparisonFeaturing Kafka with Streams API Spark Streaming

  • 24Confidential

    My WordCount is better than your WordCount (?)

    Kafka

    Spark

    These isolated code snippets are nice (and actually quite similar) but they are not very meaningful. In practice, we also need to read data from somewhere, write data back to somewhere, etc. but we can see none of this here.

  • 25Confidential

    WordCount in Kafka

    WordCount

  • 26Confidential

    Compared to: WordCount in Spark 2.0

    1

    2

    3

    Runtime model leaks into processing logic(here: interfacing from Spark with Kafka)

  • 27Confidential

    Compared to: WordCount in Spark 2.0

    4

    5Runtime model leaks into processing logic(driver vs. executors)

  • 28Confidential

    Key concepts

  • 29Confidential

    Key concepts

  • 30Confidential

    Key concepts

  • 31Confidential

    Key concepts

    Kafka Core Kafka Streams

  • 32Confidential

    Streams and TablesStream Processing meets Databases

  • 33Confidential

  • 34Confidential

  • 35Confidential

    Key observation: close relationship between Streams and Tables

    http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables

  • 36Confidential

  • 37Confidential

    Example: Streams and Tables in Kafka

    Word Count

    hello 2

    kafka 1

    world 1

  • 38Confidential

  • 39Confidential

  • 40Confidential

  • 41Confidential

  • 42Confidential

    Example: continuously compute current users per geo-region

    4

    7

    5

    3

    2

    8 4

    7

    6

    3

    2

    7

    Alice

    Real-time dashboardHow many users younger than 30y, per region?

    alice Europe

    user-locations

    alice Asia, 25y, bob Europe, 46y,

    alice Europe, 25y, bob Europe, 46y,

    -1+1

    user-locations(mobile team)

    user-prefs(web team)

  • 43Confidential

    Example: continuously compute current users per geo-regionKTable userLocations = builder.table(user-locations-topic);KTable userPrefs = builder.table(user-preferences-topic);

  • 44Confid