Download - Introducing Kafka's Streams API

Transcript
Page 1: Introducing Kafka's Streams API

1Confidential

Introducing Kafka’s Streams APIStream processing made simple

Target audience: technical staff, developers, architectsExpected duration for full deck: 45 minutes

Page 2: Introducing Kafka's Streams API

2Confidential

0.10 Data processing (Streams API)

0.9 Data integration (Connect API)

Intra-clusterreplication

0.8

Apache Kafka: birthed as a messaging system, now a streaming platform

2012 2014 2015 2016 2017

Cluster mirroring,data compression

0.7

2013

Page 3: Introducing Kafka's Streams API

3Confidential

Kafka’s Streams API: the easiest way to process data in Apache Kafka

Key Benefits of Apache Kafka’s Streams API• Build Apps, Not Clusters: no additional cluster required• Cluster to go: elastic, scalable, distributed, fault-tolerant, secure• Database to go: tables, local state, interactive queries• Equally viable for S / M / L / XL / XXL use cases• “Runs Everywhere”: integrates with your existing deployment

strategies such as containers, automation, cloud

Part of open source Apache Kafka, introduced in 0.10+• Powerful client library to build stream processing apps• Apps are standard Java applications that run on client

machines• https://github.com/apache/kafka/tree/trunk/streams

Streams API

Your App

KafkaCluster

Page 4: Introducing Kafka's Streams API

4Confidential

Kafka’s Streams API: Unix analogy

$  cat  <  in.txt | grep “apache”  | tr a-­‐z  A-­‐Z  > out.txt

Kafka  Cluster

Connect  API Streams  API

Page 5: Introducing Kafka's Streams API

5Confidential

Streams API in the context of Kafka

Streams API

Your App

KafkaCluster

Conn

ect A

PI

Conn

ect A

PI

Oth

er S

yste

ms

Oth

er S

yste

ms

Page 6: Introducing Kafka's Streams API

6Confidential

When to use Kafka’s Streams API

• Mainstream Application Development• To build core business applications• Microservices• Fast Data apps for small and big data• Reactive applications• Continuous queries and transformations• Event-triggered processes• The “T” in ETL• <and more>

Use case examples• Real-time monitoring and intelligence• Customer 360-degree view• Fraud detection• Location-based marketing• Fleet management• <and more>

Page 7: Introducing Kafka's Streams API

7Confidential

Some public use cases in the wild & external articles

• Applying Kafka’s Streams API for internal message delivery pipeline at LINE Corp.• http://developers.linecorp.com/blog/?p=3960• Kafka Streams in production at LINE, a social platform based in Japan with 220+ million users

• Microservices and reactive applications at Capital One• https://speakerdeck.com/bobbycalderwood/commander-decoupled-immutable-rest-apis-with-kafka-streams

• User behavior analysis• https://timothyrenner.github.io/engineering/2016/08/11/kafka-streams-not-looking-at-facebook.html

• Containerized Kafka Streams applications in Scala• https://www.madewithtea.com/processing-tweets-with-kafka-streams.html

• Geo-spatial data analysis• http://www.infolace.com/blog/2016/07/14/simple-spatial-windowing-with-kafka-streams/

• Language classification with machine learning• https://dzone.com/articles/machine-learning-with-kafka-streams

Page 8: Introducing Kafka's Streams API

8Confidential

Do more with less

Page 9: Introducing Kafka's Streams API

9Confidential

Architecture comparison: use case example

Real-time dashboard for security monitoring“Which of my data centers are under attack?”

Page 10: Introducing Kafka's Streams API

10Confidential

Architecture comparison: use case example

Other App

Dashboard Frontend

AppOther App

1 Capture businessevents in Kafka

2 Must process events withseparate cluster (e.g. Spark)

4 Other apps access latest resultsby querying these DBs

3 Must share latest results throughseparate systems (e.g. MySQL)

Before: Undue complexity, heavy footprint, many technologies, split ownership with conflicting priorities

Your “Job”

Other App

Dashboard Frontend

AppOther App

1 Capture businessevents in Kafka

2 Process events with standardJava apps that use Kafka Streams

3 Now other apps can directlyquery the latest results

With Kafka Streams: simplified, app-centric architecture, puts app owners in control

KafkaStreams

Your App

Page 11: Introducing Kafka's Streams API

11Confidential

Page 12: Introducing Kafka's Streams API

12Confidential

Page 13: Introducing Kafka's Streams API

13Confidential

How do I install the Streams API?

• There is and there should be no “installation” – Build Apps, Not Clusters!• It’s a library. Add it to your app like any other library.

<dependency><groupId>org.apache.kafka</groupId><artifactId>kafka-­‐streams</artifactId><version>0.10.1.1</version>

</dependency>

Page 14: Introducing Kafka's Streams API

14Confidential

“But wait a minute – where’s THE CLUSTER to process the data?”

• No cluster needed – Build Apps, Not Clusters!• Unlearn bad habits: “do cool stuff with data ≠ must have cluster”

Ok. Ok. Ok.

Page 15: Introducing Kafka's Streams API

15Confidential

Organizational benefits: decouple teams and roadmaps, scale people

Page 16: Introducing Kafka's Streams API

16Confidential

Organizational benefits: decouple teams and roadmaps, scale people

Infrastructure Team(Kafka as a shared, multi-tenant service)

Fraud detection

app

Payments team

Recommendations app

Mobile team

Securityalertsapp

Operations team

...more apps...

...

Page 17: Introducing Kafka's Streams API

17Confidential

How do I package, deploy, monitor my apps? How do I …?

• Whatever works for you. Stick to what you/your company think is the best way.• No magic needed.• Why? Because an app that uses the Streams API is…a normal Java app.

Page 18: Introducing Kafka's Streams API

18Confidential

Available APIs

Page 19: Introducing Kafka's Streams API

19Confidential

The API is but the tip of the iceberg

API,  coding

Org.  processes

Reality™Deployment

OperationsSecurity

Architecture

Debugging

Page 20: Introducing Kafka's Streams API

20Confidential

• API option 1: DSL (declarative)

KStream<Integer,  Integer>  input  =builder.stream("numbers-­‐topic");

//  Stateless  computationKStream<Integer,  Integer>  doubled  =

input.mapValues(v  -­‐>  v  *  2);

//  Stateful  computationKTable<Integer,  Integer>  sumOfOdds =  input

.filter((k,v)  -­‐>  v  %  2  !=  0)

.selectKey((k,  v)  -­‐>  1)

.groupByKey()

.reduce((v1,  v2)  -­‐>  v1  +  v2,  "sum-­‐of-­‐odds");

The preferred API for most use cases.

Particularly appeals to:

• Fans of Scala, functional programming

• Users familiar with e.g. Spark

Page 21: Introducing Kafka's Streams API

21Confidential

• API option 2: Processor API (imperative)

class  PrintToConsoleProcessorimplements  Processor<K,  V>  {

@Overridepublic  void  init(ProcessorContext context)  {}

@Overridevoid  process(K  key,  V  value)  {  

System.out.println("Got  value  "  +  value);  }

@Overridevoid  punctuate(long  timestamp)  {}

@Overridevoid  close()  {}

}

Full flexibility but more manual work

Appeals to:

• Users who require functionality that is

not yet available in the DSL

• Users familiar with e.g. Storm, Samza

• Still, check out the DSL!

Page 22: Introducing Kafka's Streams API

22Confidential

When to use Kafka Streams vs. Kafka’s “normal” consumer clients

Kafka Streams

• Basically all the time• Basically all the time• Basically all the time• Basically all the time• Basically all the time• Basically all the time• Basically all the time• Basically all the time• Basically all the time• Basically all the time• Basically all the time

Kafka consumer clients (Java, C/C++, Python, Go, …)

• When you must interact with Kafka at a very low level and/or in a very special way• Example: When integrating your own stream

processing tool (Spark, Storm) with Kafka.

Page 23: Introducing Kafka's Streams API

23Confidential

Code comparisonFeaturing Kafka with Streams API <-> Spark Streaming

Page 24: Introducing Kafka's Streams API

24Confidential

”My WordCount is better than your WordCount” (?)

Kafka

Spark

These isolated code snippets are nice (and actually quite similar) but they are not very meaningful. In practice, we also need to read data from somewhere, write data back to somewhere, etc.– but we can see none of this here.

Page 25: Introducing Kafka's Streams API

25Confidential

WordCount in Kafka

WordCount

Page 26: Introducing Kafka's Streams API

26Confidential

Compared to: WordCount in Spark 2.0

1

2

3

Runtime model leaks into processing logic(here: interfacing from Spark with Kafka)

Page 27: Introducing Kafka's Streams API

27Confidential

Compared to: WordCount in Spark 2.0

4

5Runtime model leaks into processing logic(driver vs. executors)

Page 28: Introducing Kafka's Streams API

28Confidential

Key concepts

Page 29: Introducing Kafka's Streams API

29Confidential

Key concepts

Page 30: Introducing Kafka's Streams API

30Confidential

Key concepts

Page 31: Introducing Kafka's Streams API

31Confidential

Key concepts

Kafka Core Kafka Streams

Page 32: Introducing Kafka's Streams API

32Confidential

Streams and TablesStream Processing meets Databases

Page 33: Introducing Kafka's Streams API

33Confidential

Page 34: Introducing Kafka's Streams API

34Confidential

Page 35: Introducing Kafka's Streams API

35Confidential

Key observation: close relationship between Streams and Tables

http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple http://docs.confluent.io/current/streams/concepts.html#duality-of-streams-and-tables

Page 36: Introducing Kafka's Streams API

36Confidential

Page 37: Introducing Kafka's Streams API

37Confidential

Example: Streams and Tables in Kafka

Word Count

hello 2

kafka 1

world 1

… …

Page 38: Introducing Kafka's Streams API

38Confidential

Page 39: Introducing Kafka's Streams API

39Confidential

Page 40: Introducing Kafka's Streams API

40Confidential

Page 41: Introducing Kafka's Streams API

41Confidential

Page 42: Introducing Kafka's Streams API

42Confidential

Example: continuously compute current users per geo-region

4

7

5

3

2

8 4

7

6

3

2

7

Alice

Real-time dashboard“How many users younger than 30y, per region?”

alice Europe

user-locations

alice Asia, 25y, …bob Europe, 46y, …

… …

alice Europe, 25y, …bob Europe, 46y, …

… …

-1+1

user-locations(mobile team)

user-prefs(web team)

Page 43: Introducing Kafka's Streams API

43Confidential

Example: continuously compute current users per geo-regionKTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”);KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”);

Page 44: Introducing Kafka's Streams API

44Confidential

Example: continuously compute current users per geo-region

alice Europe

user-locationsalice Asia, 25y, …bob Europe, 46y, …

… …

alice Europe, 25y, …bob Europe, 46y, …

… …

KTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”);KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”);

//  Merge  into  detailed  user  profiles  (continuously  updated)KTable<UserId,  UserProfile>  userProfiles =

userLocations.join(userPrefs,  (loc,  prefs)  -­‐>  new  UserProfile(loc,  prefs));

KTable userProfilesKTable userProfiles

Page 45: Introducing Kafka's Streams API

45Confidential

Example: continuously compute current users per geo-regionKTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”);KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”);

//  Merge  into  detailed  user  profiles  (continuously  updated)KTable<UserId,  UserProfile>  userProfiles =

userLocations.join(userPrefs,  (loc,  prefs)  -­‐>  new  UserProfile(loc,  prefs));

//  Compute  per-­‐region  statistics  (continuously  updated)KTable<UserId,  Long>  usersPerRegion =  userProfiles

.filter((userId,  profile)    -­‐>  profile.age <  30)

.groupBy((userId,  profile)  -­‐>  profile.location)

.count();

alice Europe

user-locationsAfrica 3

… …Asia 8

Europe 5

Africa 3… …

Asia 7Europe 6

KTable usersPerRegion KTable usersPerRegion

Page 46: Introducing Kafka's Streams API

46Confidential

Example: continuously compute current users per geo-region

4

7

5

3

2

8 4

7

6

3

2

7

Alice

Real-time dashboard“How many users younger than 30y, per region?”

alice Europe

user-locations

alice Asia, 25y, …bob Europe, 46y, …

… …

alice Europe, 25y, …bob Europe, 46y, …

… …

-1+1

user-locations(mobile team)

user-prefs(web team)

Page 47: Introducing Kafka's Streams API

47Confidential

Streams meet Tables – in the DSL

Page 48: Introducing Kafka's Streams API

48Confidential

Streams meet Tables

• Most use cases for stream processing require both Streams and Tables• Essential for any stateful computations

• Kafka ships with first-class support for Streams and Tables• Scalability, fault tolerance, efficient joins and aggregations, …

• Benefits include: simplified architectures, less moving pieces, less Do-It-Yourself work

Page 49: Introducing Kafka's Streams API

49Confidential

Key features

Page 50: Introducing Kafka's Streams API

50Confidential

Key features in 0.10

• Native, 100%-compatible Kafka integration

Page 51: Introducing Kafka's Streams API

51Confidential

Native, 100% compatible Kafka integration

Read  from  Kafka

Write  to  Kafka

Page 52: Introducing Kafka's Streams API

52Confidential

Key features in 0.10

• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features

Page 53: Introducing Kafka's Streams API

53Confidential

Secure stream processing with the Streams API

• Your applications can leverage all client-side security features in Apache Kafka

• Security features include:• Encrypting data-in-transit between applications and Kafka clusters• Authenticating applications against Kafka clusters (“only some apps may talk to the production

cluster”)• Authorizing application against Kafka clusters (“only some apps may read data from sensitive topics”)

Page 54: Introducing Kafka's Streams API

54Confidential

Configuring security settings

• In general, you can configure both Kafka Streams plus the underlying Kafka clients in your apps

Page 55: Introducing Kafka's Streams API

55Confidential

Configuring security settings

• Example: encrypting data-in-transit + client authentication to Kafka cluster

Full demo application at https://github.com/confluentinc/examples

Page 56: Introducing Kafka's Streams API

56Confidential

Key features in 0.10

• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant

Page 57: Introducing Kafka's Streams API

57Confidential

Page 58: Introducing Kafka's Streams API

58Confidential

Page 59: Introducing Kafka's Streams API

59Confidential

Page 60: Introducing Kafka's Streams API

60Confidential

Key features in 0.10

• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations

Page 61: Introducing Kafka's Streams API

61Confidential

Stateful computations

• Stateful computations like aggregations (e.g. counting), joins, or windowing require state• State stores are the backbone of state management

• … are local for best performance• … are backed up to Kafka for elasticity and for fault-tolerance• ... are per stream task for isolation – think: share-nothing

• Pluggable storage engines• Default: RocksDB (a key-value store) to allow for local state that is larger than available RAM• You can also use your own, custom storage engine

• From the user perspective:• DSL: no need to worry about anything, state management is automatically being done for you• Processor API: direct access to state stores – very flexible but more manual work

Page 62: Introducing Kafka's Streams API

62Confidential

Page 63: Introducing Kafka's Streams API

63Confidential

Page 64: Introducing Kafka's Streams API

64Confidential

Page 65: Introducing Kafka's Streams API

65Confidential

Page 66: Introducing Kafka's Streams API

66Confidential

Use case: real-time, distributed joins at large scale

Page 67: Introducing Kafka's Streams API

67Confidential

Use case: real-time, distributed joins at large scale

Page 68: Introducing Kafka's Streams API

68Confidential

Use case: real-time, distributed joins at large scale

Page 69: Introducing Kafka's Streams API

69Confidential

Stateful computations

• Use the Processor API to interact directly with state stores

Get  the  store

Use  the  store

Page 70: Introducing Kafka's Streams API

70Confidential

Key features in 0.10

• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries

Page 71: Introducing Kafka's Streams API

71Confidential

Page 72: Introducing Kafka's Streams API

72Confidential

Interactive Queries: architecture comparison

KafkaStreams

AppApp

App

App

1 Capture businessevents in Kafka

2 Process the eventswith Kafka Streams

4 Other apps query externalsystems for latest results

! Must use external systemsto share latest results

App

App

App

1 Capture businessevents in Kafka

2 Process the eventswith Kafka Streams

3 Now other apps can directlyquery the latest results

Before (0.10.0)

After (0.10.1): simplified, more app-centric architecture

KafkaStreams

App

Page 73: Introducing Kafka's Streams API

73Confidential

Key features in 0.10

• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries• Time model

Page 74: Introducing Kafka's Streams API

74Confidential

Time

Page 75: Introducing Kafka's Streams API

75Confidential

Time

A

C

B

Page 76: Introducing Kafka's Streams API

76Confidential

Time

• You configure the desired time semantics through timestamp extractors• Default extractor yields event-time semantics

• Extracts embedded timestamps of Kafka messages (introduced in v0.10)

Page 77: Introducing Kafka's Streams API

77Confidential

Key features in 0.10

• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries• Time model• Windowing

Page 78: Introducing Kafka's Streams API

78Confidential

Windowing

• Group events in a stream using time-based windows• Use case examples:

• Time-based analysis of ad impressions (”number of ads clicked in the past hour”)• Monitoring statistics of telemetry data (“1min/5min/15min averages”)

Input data, wherecolors represent

different users events

Rectangles denotedifferent event-time

windows

processing-time

event-time

windowing

alice

bob

dave

Page 79: Introducing Kafka's Streams API

79Confidential

Windowing in the DSL

TimeWindows.of(3000)

TimeWindows.of(3000).advanceBy(1000)

Page 80: Introducing Kafka's Streams API

80Confidential

Key features in 0.10

• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries• Time model• Windowing• Supports late-arriving and out-of-order data

Page 81: Introducing Kafka's Streams API

81Confidential

Out-of-order and late-arriving data

• Is very common in practice, not a rare corner case• Related to time model discussion

Page 82: Introducing Kafka's Streams API

82Confidential

Out-of-order and late-arriving data: example when this will happen

Users with mobile phones enterairplane, lose Internet connectivity

Emails are being writtenduring the 10h flight

Internet connectivity is restored,phones will send queued emails now

Page 83: Introducing Kafka's Streams API

83Confidential

Out-of-order and late-arriving data

• Is very common in practice, not a rare corner case• Related to time model discussion

• We want control over how out-of-order data is handled, and handling must be efficient• Example: We process data in 5-minute windows, e.g. compute statistics

• Option A: When event arrives 1 minute late: update the original result!• Option B: When event arrives 2 hours late: discard it!

Page 84: Introducing Kafka's Streams API

84Confidential

Key features in 0.10

• Native, 100%-compatible Kafka integration• Secure stream processing using Kafka’s security features• Elastic and highly scalable• Fault-tolerant• Stateful and stateless computations• Interactive queries• Time model• Windowing• Supports late-arriving and out-of-order data• Millisecond processing latency, no micro-batching• At-least-once processing guarantees (exactly-once is in the works as we speak)

Page 85: Introducing Kafka's Streams API

85Confidential

Roadmap Outlook

Page 86: Introducing Kafka's Streams API

86Confidential

Roadmap outlook for Kafka Streams

• Exactly-Once processing semantics• Unified API for real-time processing and “batch” processing• Global KTables• Session windows• … and more …

Page 87: Introducing Kafka's Streams API

87Confidential

Wrapping Up

Page 88: Introducing Kafka's Streams API

88Confidential

Where to go from here

• Kafka Streams is available in Confluent Platform 3.1 and in Apache Kafka 0.10.1• http://www.confluent.io/download

• Kafka Streams demos: https://github.com/confluentinc/examples • Java 7, Java 8+ with lambdas, and Scala• WordCount, Interactive Queries, Joins, Security, Windowing, Avro integration, …

• Confluent documentation: http://docs.confluent.io/current/streams/• Quickstart, Concepts, Architecture, Developer Guide, FAQ

• Recorded talks• Introduction to Kafka Streams:

http://www.youtube.com/watch?v=o7zSLNiTZbA• Application Development and Data in the Emerging World of Stream Processing (higher level talk):

https://www.youtube.com/watch?v=JQnNHO5506w

Page 89: Introducing Kafka's Streams API

89Confidential

Thank You

Page 90: Introducing Kafka's Streams API

90Confidential

Appendix: Streams and TablesA closer look

Page 91: Introducing Kafka's Streams API

91Confidential

Motivating example: continuously compute current users per geo-region

4

7

5

3

2

8

Real-time dashboard“How many users younger than 30y, per region?”

alice Asia, 25y, …bob Europe, 46y, …

… …

user-locations(mobile team)

user-prefs(web team)

Page 92: Introducing Kafka's Streams API

92Confidential

Motivating example: continuously compute current users per geo-region

4

7

5

3

2

8

Real-time dashboard“How many users younger than 30y, per region?”

alice Europe

user-locations

alice Asia, 25y, …bob Europe, 46y, …

… …

user-locations(mobile team)

user-prefs(web team)

Page 93: Introducing Kafka's Streams API

93Confidential

Motivating example: continuously compute current users per geo-region

4

7

5

3

2

8

Real-time dashboard“How many users younger than 30y, per region?”

alice Europe

user-locations

user-locations(mobile team)

user-prefs(web team)

alice Asia, 25y, …bob Europe, 46y, …

… …

alice Europe, 25y, …bob Europe, 46y, …

… …

Page 94: Introducing Kafka's Streams API

94Confidential

Motivating example: continuously compute current users per geo-region

4

7

5

3

2

8 4

7

6

3

2

7

Alice

Real-time dashboard“How many users younger than 30y, per region?”

alice Europe

user-locations

alice Asia, 25y, …bob Europe, 46y, …

… …

alice Europe, 25y, …bob Europe, 46y, …

… …

-1+1

user-locations(mobile team)

user-prefs(web team)

Page 95: Introducing Kafka's Streams API

95Confidential

Same data, but different use cases require different interpretations

alice San Francisco

alice New York City

alice Rio de Janeiro

alice Sydney

alice Beijing

alice Paris

alice Berlin

Page 96: Introducing Kafka's Streams API

96Confidential

Same data, but different use cases require different interpretations

alice San Francisco

alice New York City

alice Rio de Janeiro

alice Sydney

alice Beijing

alice Paris

alice Berlin

Use  case  1:  Frequent  traveler  status?

Use  case  2:  Current  location?

Page 97: Introducing Kafka's Streams API

97Confidential

Same data, but different use cases require different interpretations

“Alice has been to SFO, NYC, Rio, Sydney,Beijing, Paris, and finally Berlin.”

“Alice is in SFO, NYC, Rio, Sydney,Beijing, Paris, Berlin right now.”

⚑ ⚑ ⚑⚑

⚑⚑

⚑ ⚑ ⚑ ⚑⚑

⚑⚑

Use  case  1:  Frequent  traveler  status? Use  case  2:  Current  location?

Page 98: Introducing Kafka's Streams API

98Confidential

Same data, but different use cases require different interpretations

alice San Francisco

alice New York City

alice Rio de Janeiro

alice Sydney

alice Beijing

alice Paris

alice Berlin

Use  case  1:  Frequent  traveler  status?

Use  case  2:  Current  location?

⚑ ⚑ ⚑⚑ ⚑

⚑⚑

Page 99: Introducing Kafka's Streams API

99Confidential

Same data, but different use cases require different interpretations

alice San Francisco

alice New York City

alice Rio de Janeiro

alice Sydney

alice Beijing

alice Paris

alice Berlin

Use  case  1:  Frequent  traveler  status?

Use  case  2:  Current  location?

⚑ ⚑ ⚑⚑ ⚑

⚑⚑

Page 100: Introducing Kafka's Streams API

100Confidential

Same data, but different use cases require different interpretations

alice San Francisco

alice New York City

alice Rio de Janeiro

alice Sydney

alice Beijing

alice Paris

alice Berlin

Use  case  1:  Frequent  traveler  status?

Use  case  2:  Current  location?

⚑ ⚑ ⚑⚑ ⚑

⚑⚑

Page 101: Introducing Kafka's Streams API

101Confidential

Streams meet Tables

record stream

When you need… so that the topic isinterpreted as a

All the values of a key KStream

then you’d read theKafka topic into a

Example

All the places Alicehas ever been to

with messagesinterpreted as

INSERT(append)

Page 102: Introducing Kafka's Streams API

102Confidential

Streams meet Tables

record stream

changelog stream

When you need… so that the topic isinterpreted as a

All the values of a key

Latest value of a key

KStream

KTable

then you’d read theKafka topic into a

Example

All the places Alicehas ever been to

Where Aliceis right now

with messagesinterpreted as

INSERT(append)

UPSERT(overwrite

existing)

Page 103: Introducing Kafka's Streams API

103Confidential

Same data, but different use cases require different interpretations

“Alice has been to SFO, NYC, Rio, Sydney,Beijing, Paris, and finally Berlin.”

“Alice is in SFO, NYC, Rio, Sydney,Beijing, Paris, Berlin right now.”

⚑ ⚑ ⚑⚑

⚑⚑

⚑ ⚑ ⚑ ⚑⚑

⚑⚑

Use  case  1:  Frequent  traveler  status? Use  case  2:  Current  location?

KStream KTable

Page 104: Introducing Kafka's Streams API

104Confidential

Motivating example: continuously compute current users per geo-region

4

7

5

3

2

8 4

7

6

3

2

7

Alice

Real-time dashboard“How many users younger than 30y, per region?”

alice Europe

user-locations

alice Asia, 25y, …bob Europe, 46y, …

… …

alice Europe, 25y, …bob Europe, 46y, …

… …

-1+1

user-locations(mobile team)

user-prefs(web team)

Page 105: Introducing Kafka's Streams API

105Confidential

Motivating example: continuously compute current users per geo-regionKTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”);KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”);

Page 106: Introducing Kafka's Streams API

106Confidential

Motivating example: continuously compute current users per geo-region

alice Europe

user-locationsalice Asia, 25y, …bob Europe, 46y, …

… …

alice Europe, 25y, …bob Europe, 46y, …

… …

KTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”);KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”);

//  Merge  into  detailed  user  profiles  (continuously  updated)KTable<UserId,  UserProfile>  userProfiles =

userLocations.join(userPrefs,  (loc,  prefs)  -­‐>  new  UserProfile(loc,  prefs));

KTable userProfilesKTable userProfiles

Page 107: Introducing Kafka's Streams API

107Confidential

Motivating example: continuously compute current users per geo-regionKTable<UserId,  Location>  userLocations =  builder.table(“user-­‐locations-­‐topic”);KTable<UserId,  Prefs>        userPrefs =  builder.table(“user-­‐preferences-­‐topic”);

//  Merge  into  detailed  user  profiles  (continuously  updated)KTable<UserId,  UserProfile>  userProfiles =

userLocations.join(userPrefs,  (loc,  prefs)  -­‐>  new  UserProfile(loc,  prefs));

//  Compute  per-­‐region  statistics  (continuously  updated)KTable<UserId,  Long>  usersPerRegion =  userProfiles

.filter((userId,  profile)    -­‐>  profile.age <  30)

.groupBy((userId,  profile)  -­‐>  profile.location)

.count();

alice Europe

user-locationsAfrica 3

… …Asia 8

Europe 5

Africa 3… …

Asia 7Europe 6

KTable usersPerRegion KTable usersPerRegion

Page 108: Introducing Kafka's Streams API

108Confidential

Motivating example: continuously compute current users per geo-region

4

7

5

3

2

8 4

7

6

3

2

7

Alice

Real-time dashboard“How many users younger than 30y, per region?”

alice Europe

user-locations

alice Asia, 25y, …bob Europe, 46y, …

… …

alice Europe, 25y, …bob Europe, 46y, …

… …

-1+1

user-locations(mobile team)

user-prefs(web team)

Page 109: Introducing Kafka's Streams API

109Confidential

Another common use case: continuous transformations

• Example: to enrich an input stream (user clicks) with side data (current user profile)

KStream alice /rental/p8454vb, 06:59 PM PDT

user-clicks-topics (at 1M msgs/s)

“facts”

Page 110: Introducing Kafka's Streams API

110Confidential

Another common use case: continuous transformations

• Example: to enrich an input stream (user clicks) with side data (current user profile)

KStream alice /rental/p8454vb, 06:59 PM PDT

alice Asia, 25ybob Europe, 46y

… …KTable

user-profiles-topic

user-clicks-topics (at 1M msgs/s)

“facts”

“dimensions”

Page 111: Introducing Kafka's Streams API

111Confidential

Another common use case: continuous transformations

• Example: to enrich an input stream (user clicks) with side data (current user profile)

KStream

alice /rental/p8454vb, 06:59 PDT, Asia, 25y

stream.JOIN(table)

alice /rental/p8454vb, 06:59 PM PDT

alice Asia, 25ybob Europe, 46y

… …KTable

user-profiles-topic

user-clicks-topics (at 1M msgs/s)

“facts”

“dimensions”

Page 112: Introducing Kafka's Streams API

112Confidential

Another common use case: continuous transformations

• Example: to enrich an input stream (user clicks) with side data (current user profile)

KStream

alice /rental/p8454vb, 06:59 PDT, Asia, 25y

stream.JOIN(table)

alice /rental/p8454vb, 06:59 PM PDT

alice Asia, 25ybob Europe, 46y

… …KTable

alice Europe, 25ybob Europe, 46y

… …alice Europe

new update for alice from user-locations topicuser-profiles-topic

user-clicks-topics (at 1M msgs/s)

“facts”

“dimensions”

Page 113: Introducing Kafka's Streams API

113Confidential

Appendix: Interactive QueriesA closer look

Page 114: Introducing Kafka's Streams API

114Confidential

Interactive Queries

Page 115: Introducing Kafka's Streams API

115Confidential

Interactive Queries

charlie 3bob 5 alice 2

Page 116: Introducing Kafka's Streams API

116Confidential

Interactive Queries

New  API  to  accesslocal  state  stores  ofan  app  instance

charlie 3bob 5 alice 2

Page 117: Introducing Kafka's Streams API

117Confidential

Interactive Queries

New  API  to  discoverrunning  app  instances

charlie 3bob 5 alice 2

“host1:4460” “host5:5307” “host3:4777”

Page 118: Introducing Kafka's Streams API

118Confidential

Interactive Queries

You:  inter-­app  communication  (RPC  layer)