Introducing Kafka's Streams API

    Introducing Kafkas Streams APIStream processing made simple

    Target audience: technical staff, developers, architectsExpected duration for full deck: 45 minutes

    0.10 Data processing (Streams API)

    0.9 Data integration (Connect API)



    Apache Kafka: birthed as a messaging system, now a streaming platform

    2012 2014 2015 2016 2017

    Cluster mirroring,data compression



    Kafkas Streams API: the easiest way to process data in Apache Kafka

    Key Benefits of Apache Kafkas Streams API Build Apps, Not Clusters: no additional cluster required Cluster to go: elastic, scalable, distributed, fault-tolerant, secure Database to go: tables, local state, interactive queries Equally viable for S / M / L / XL / XXL use cases Runs Everywhere: integrates with your existing deployment

    strategies such as containers, automation, cloud

    Part of open source Apache Kafka, introduced in 0.10+ Powerful client library to build stream processing apps Apps are standard Java applications that run on client


    Kafkas Streams API: Unix analogy

    $ cat < in.txt | grep apache | tr a-z A-Z > out.txt

    Kafka Cluster

    Connect API Streams API

    Streams API in the context of Kafka

    Your App



    When to use Kafkas Streams API

    Mainstream Application Development To build core business applications Microservices Fast Data apps for small and big data Reactive applications Continuous queries and transformations Event-triggered processes The T in ETL

    Use case examples Real-time monitoring and intelligence Customer 360-degree view Fraud detection Location-based marketing Fleet management

    Some public use cases in the wild & external articles

    Applying Kafkas Streams API for internal message delivery pipeline at LINE Corp. Kafka Streams in production at LINE, a social platform based in Japan with 220+ million users

    Microservices and reactive applications at Capital One

    User behavior analysis

    Containerized Kafka Streams applications in Scala

    Geo-spatial data analysis

    Language classification with machine learning

    Do more with less

    Architecture comparison: use case example

    Real-time dashboard for security monitoringWhich of my data centers are under attack?

    Architecture comparison: use case example

    Other App

    Dashboard Frontend

    AppOther App

    1 Capture businessevents in Kafka

    2 Must process events withseparate cluster (e.g. Spark)

    4 Other apps access latest resultsby querying these DBs3Must share latest results throughseparate systems (e.g. MySQL)

    Before: Undue complexity, heavy footprint, many technologies, split ownership with conflicting priorities

    Your Job

    Other App

    Dashboard Frontend

    AppOther App

    1 Capture businessevents in Kafka

    2 Process events with standardJava apps that use Kafka Streams

    3 Now other apps can directlyquery the latest results

    With Kafka Streams: simplified, app-centric architecture, puts app owners in control


    Your App

    How do I install the Streams API?

    There is and there should be no installation Build Apps, Not Clusters! Its a library. Add it to your app like any other library.


    But wait a minute wheres THE CLUSTER to process the data?

    No cluster needed Build Apps, Not Clusters! Unlearn bad habits: do cool stuff with data must have cluster

    Organizational benefits: decouple teams and roadmaps, scale people

    Organizational benefits: decouple teams and roadmaps, scale people

    Infrastructure Team(Kafka as a shared, multi-tenant service)

    Fraud detection


    Payments team

    Recommendations app

    Mobile team


    Operations team

    ...more apps...


    How do I package, deploy, monitor my apps? How do I ?

    Whatever works for you. Stick to what you/your company think is the best way. No magic needed. Why? Because an app that uses the Streams API isa normal Java app.

    Available APIs

    The API is but the tip of the iceberg

    API, coding

    Org. processes





    API option 1: DSL (declarative)

    KStream input"numbers-topic");

    // Stateless computationKStream doubled =

    input.mapValues(v -> v * 2);

    // Stateful computationKTable sumOfOdds = input

    .filter((k,v) -> v % 2 != 0)

    .selectKey((k, v) -> 1)


    .reduce((v1, v2) -> v1 + v2, "sum-of-odds");

    The preferred API for most use cases.

    Particularly appeals to:

    Fans of Scala, functional programming

    Users familiar with e.g. Spark

    API option 2: Processor API (imperative)

    class PrintToConsoleProcessorimplements Processor {

    @Overridepublic void init(ProcessorContext context) {}

    @Overridevoid process(K key, V value) {

    System.out.println("Got value " + value); }

    @Overridevoid punctuate(long timestamp) {}

    @Overridevoid close() {}


    Full flexibility but more manual work

    Appeals to:

    Users who require functionality that is

    not yet available in the DSL

    Users familiar with e.g. Storm, Samza

    Still, check out the DSL!

    When to use Kafka Streams vs. Kafkas normal consumer clients

    Kafka Streams

    Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time Basically all the time

    Kafka consumer clients (Java, C/C++, Python, Go, )

    When you must interact with Kafka at a very low level and/or in a very special way Example: When integrating your own stream

    processing tool (Spark, Storm) with Kafka.

    Code comparisonFeaturing Kafka with Streams API Spark Streaming

    My WordCount is better than your WordCount (?)



    These isolated code snippets are nice (and actually quite similar) but they are not very meaningful. In practice, we also need to read data from somewhere, write data back to somewhere, etc. but we can see none of this here.

    WordCount in Kafka


    Compared to: WordCount in Spark 2.0




    Runtime model leaks into processing logic(here: interfacing from Spark with Kafka)

  • 27Confidential

    Compared to: WordCount in Spark 2.0


    5Runtime model leaks into processing logic(driver vs. executors)

    Key concepts

    Key concepts

    Key concepts

    Key concepts

    Kafka Core Kafka Streams

    Streams and TablesStream Processing meets Databases

    Key observation: close relationship between Streams and Tables

    Example: Streams and Tables in Kafka

    Word Count

    hello 2

    kafka 1

    world 1

    Example: continuously compute current users per geo-region






    8 4







    Real-time dashboardHow many users younger than 30y, per region?

    alice Europe


    alice Asia, 25y, bob Europe, 46y,

    alice Europe, 25y, bob Europe, 46y,


    user-locations(mobile team)

    user-prefs(web team)

    Example: continuously compute current users per geo-regionKTable userLocations = builder.table(user-locations-topic);KTable userPrefs = builder.table(user-preferences-topic);

