Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

61
1 Confidential Apache Kafka + Machine Learning Analytic Models Applied to Real Time Stream Processing Kai Waehner Technology Evangelist [email protected] LinkedIn @KaiWaehner www.kai-waehner.de

Transcript of Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

Page 1: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

1Confidential

Apache Kafka + Machine LearningAnalytic Models Applied to Real Time Stream Processing

Kai WaehnerTechnology Evangelist

[email protected]

LinkedIn

@KaiWaehner

www.kai-waehner.de

Page 2: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

2Apache Kafka and Machine Learning

Agenda

1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models

Page 3: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

3Apache Kafka and Machine Learning

Agenda

1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models

Page 4: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

4Apache Kafka and Machine Learning

Machine Learning

... allows computers to find hidden insights without being explicitly programmed where to look.

Page 5: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

5Apache Kafka and Machine Learning

Real World Examples of Machine Learning

Spam Detection Search Results +Product Recommendation

Picture Detection(Friends, Locations, Products)

Your Company

The Next Disruption:Google Beats Go Champion

Page 6: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

6Apache Kafka and Machine Learning

Leverage Machine Learning to Analyze and Act on Critical Business Moments

Seconds Minutes Hours

Price Optimization

Predictive Maintenance

Fraud Detection

Cross Selling

Transportation Rerouting

Customer Service

Inventory Management

Windows of Opportunity

Page 7: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

7Apache Kafka and Machine Learning

How to realize these use cases?

Page 8: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

8Apache Kafka and Machine Learning

Big Data Analytics

Volume(terabytes,petabytes)

Variety(social networks, blog posts, logs,

sensors, etc.)

Velocity(„real time“)

Value

Page 9: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

9Apache Kafka and Machine Learning

Big Data Analytics for Actionable Insights

From Insight to Action

(continuously closed loop)

Page 10: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

10Apache Kafka and Machine Learning

Streaming Platform

Big Data Analytics

Database

IoT Device

Streaming Producer

…..

DWH

Data Integration

CONNECT

CONNECT

DataLakeModel

Building

Batch

Real Time

Stream Processing

RESTInterface

IoT Device

Mobile App

Streaming Consumer

CONNECT

CONNECT

BI Tool

Messaging

Web Application

Model

Schema Registry / Governance

1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer

Page 11: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

11Apache Kafka and Machine Learning

Agenda

1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models

Page 12: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

12Apache Kafka and Machine Learning

Streaming Platform

Big Data Analytics

Database

IoT Device

Streaming Producer

…..

DWH

Data Integration

CONNECT

CONNECT

DataLakeModel

Building

Batch

Real Time

Stream Processing

RESTInterface

IoT Device

Mobile App

Streaming Consumer

CONNECT

CONNECT

BI Tool

Messaging

Web Application

Model

Schema Registry / Governance

1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer

Page 13: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

13Apache Kafka and Machine Learning

Hidden Technical Debt in Machine Learning Systems

https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf

Writing source codeis not the

time-consumingtask!

!

Page 14: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

14Apache Kafka and Machine Learning

Analytical Pipeline

1. Data Access

2. Data Preparation

3. Exploratory Data Analysis

4. Model Building

5. Model Execution

6. Model Validation

7. Deployment

Page 15: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

15Apache Kafka and Machine Learning

Data Access

Find insights to createadded business value

by correlating various data sources!

Page 16: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

16Apache Kafka and Machine Learning

Data Preparation

http://www.slideshare.net/odsc/feature-engineering

Data Preparation

Page 17: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

17Apache Kafka and Machine Learning

Exploratory Data Analysis

© Copyright 2000-2017 TIBCO Software Inc.

• Scripting

• Visual Analytics

• Machine Learning

Page 18: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

18Apache Kafka and Machine Learning

Model Building

A model is a simplification of the truth that helps you with decision making.

Page 19: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

19Apache Kafka and Machine Learning

Model Execution (Coding)

Apply Model to New Data

Page 20: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

20Apache Kafka and Machine Learning

Model Execution (Tooling)

Apply Model to New Data

Page 21: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

21Apache Kafka and Machine Learning

Model Validation

https://genome.tugraz.at/proclassify/help/pages/XV.html

Cross-Validation Procedure

Page 22: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

22Apache Kafka and Machine Learning

Frameworks and Tooling?

Page 23: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

23Apache Kafka and Machine Learning

Languages, Frameworks and Tools

Many more ….

Portable Format for Analytics (PFA)

Page 24: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

24Apache Kafka and Machine Learning

Live Demos with Open Source Technologies

Development of Analytic Modelswith R, TensorFlow, Apache Spark, H2O.ai, RapidMiner

Page 25: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

25Apache Kafka and Machine Learning

Live Demo

Use Case: Customer Churn Prediction

Machine Learning Algorithm:Generalized Linear Model (GLM)using Logistic Regression

Technology:Open Source R

Page 26: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

26Apache Kafka and Machine Learning

Live Demo

Use Case: Airline Flight Delay Prediction

Machine Learning Algorithm:Gradient Boosted Machines (GBM)using Decision Trees

Technology:H2O.ai

Page 27: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

27Apache Kafka and Machine Learning

Live Demo

Use Case: Predictive Maintenance(Anomaly Detection in Telco Networks)

Deep Learning Algorithm:Artificial Neural Networks (ANN)using Autoencoders

Technology:TensorFlow + Python API

Page 28: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

28Apache Kafka and Machine Learning

Live Demo

Use Case: Classification (Prediction of Titanic Survivors)

Deep Learning Algorithm:Recurrent Neural Networks (RNN)

Technology:RapidMiner

Page 29: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

29Apache Kafka and Machine Learning

Agenda

1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models

Page 30: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

30Apache Kafka and Machine Learning

Analytical Pipeline

1. Data Access

2. Data Preparation

3. Exploratory Data Analysis

4. Model Building

5. Model Execution

6. Model Validation

7. Deployment

Page 31: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

31Apache Kafka and Machine Learning

Streaming Platform

Big Data Analytics

Database

IoT Device

Streaming Producer

…..

DWH

Data Integration

CONNECT

CONNECT

DataLakeModel

Building

Batch

Real Time

Stream Processing

RESTInterface

IoT Device

Mobile App

Streaming Consumer

CONNECT

CONNECT

BI Tool

Messaging

Web Application

Model

Schema Registry / Governance

1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer

Page 32: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

32Apache Kafka and Machine Learning

Definition of Stream Processsing

Data at Rest Data in Motion

Page 33: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

33Apache Kafka and Machine Learning

Key Concepts

Page 34: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

34Apache Kafka and Machine Learning

Key Concepts

Page 35: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

35Apache Kafka and Machine Learning

Key Concepts

Page 36: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

36Apache Kafka and Machine Learning

Stream Processing

Use Cases• Real Time Applications• Stateful Streaming Analytics• Stateless “Real Time ETL”

Page 37: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

37Apache Kafka and Machine Learning

Event Processing Windows

Various Options for Windowing (Fixed, Sliding, Session, …)

Page 38: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

38Apache Kafka and Machine Learning

How to apply analytic models to real time processing without redevelopment?

Page 39: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

39Apache Kafka and Machine Learning

Application of Analytic Models to Real Time without Redevelopment

StreamProcessing

H20.ai

R

PythonSpark ML

MATLAB

SAS

PMML

Page 40: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

40Apache Kafka and Machine Learning

Streaming Analytics - Processing Pipeline

APIs

Adapters / Channels

Integration

Messaging

Stream Ingest

Transformation

Aggregation

Enrichment

Filtering

StreamPreprocessing

Process Management

Analytics (Real Time)

Applications& APIs

Analytics / DW Reporting

StreamOutcomes

• Contextual Rules

• Windowing

• Patterns

• Analytics

• Machine Learning

• …

Stream Analytics

Index / SearchNormalization

Applying an Analytic Modelis just a piece of the puzzle!

Page 41: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

41Apache Kafka and Machine Learning

Frameworks and Tooling?

Page 42: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

42Apache Kafka and Machine Learning

Frameworks and Products

OPEN SOURCE CLOSED SOURCE

PRODUCT

FRAMEWORK

Azure MicrosoftStream Analytics

Page 43: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

43Apache Kafka and Machine Learning

When to use Kafka Streams for Stream Processing?

Page 44: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

44Apache Kafka and Machine Learning

When to use Kafka Streams for Stream Processing?

No need for a Big Data cluster

Deploy in your existing infrastructure

Kafka managesscalability / fail-over

Focus on developmentof business logic

in your department

Page 45: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

45Apache Kafka and Machine Learning

Kafka Streams

Map, filter, aggregate, apply analytic model, „any business logic“

Input Stream(Kafka Topic)

Kafka Cluster

Output Stream(Kafka Topic)

Kafka Cluster

Stream ProcessingMicroservice

(Kafka Streams)

Deployed anywhere: Docker, Kubernetes, Mesos, Java App, …

Page 46: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

46Apache Kafka and Machine Learning

A complete streaming microservices, ready for production at large-scale

WordCount

App configuration

Define processing(here: WordCount)

Start processing

Page 47: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

47Apache Kafka and Machine Learning

Confluent Platform: the Free, Open-Source Streaming Platform

Open Source ExternalCommercial

Confluent Platform

Monitoring

Analytics

Custom Apps

Transformations

Real-time Applications

CRM

Data Warehouse

Database

Hadoop

DataIntegration

Control Center Auto-dataBalancing

Multi-Data Center Replication 24/7 Support

Supported Connectors Clients Schema

RegistryREST Proxy

Apache Kafka

KafkaConnect

KafkaStreams

KafkaCore

Database Changes Log Events loT Data Web Events …

Page 48: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

48Apache Kafka and Machine Learning

Streaming Platform

Big Data Analytics

Database

IoT Device

Streaming Producer

…..

DWH

Data Integration

CONNECT

CONNECT

DataLakeModel

Building

Batch

Real Time

Stream Processing

RESTInterface

IoT Device

Mobile App

Streaming Consumer

CONNECT

CONNECT

BI Tool

Messaging

Web Application

Model

Schema Registry / Governance

1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer

Page 49: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

49Apache Kafka and Machine Learning

STREAMING PLATFORM

BIG DATA ANALYTICS

Oracle DB

CoaP IoT

Kafka Java Client

…..

HP Vertica

Data Integration

FLUME

H2O.ai, Spark,

TensorFlow

Batch

Real Time

ConfluentREST Proxy

MQTT IoT

iPhone App

KafkaGo Client

CK OA NF NK EA C

T

HIVE

Grafana

Kafka

Java EE Web App

Hadoop

CK OA NF NK EA C

T

Confluent Schema Registry

Kafka Streams

H2O.aiMesos

Kafka Streams

TensorFlow

Kubernetes

Avro

Avro

1) Data Producer2) Analytics Platform3) Streaming Platform4) Data Consumer

Page 50: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

50Apache Kafka and Machine Learning

Live Demos with Open Source Technologies

Development of Analytic Modelswith Apache Kafka Messaging, Kafka Streams, Kafka Connect, Confluent Schema Registry

Page 51: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

51Apache Kafka and Machine Learning

Live Demo

Use Case: Airline Flight Delay Prediction

Machine Learning Algorithm:Any! (in our example, H2O.ai GBM)

Streaming Platform:Apache Kafka Core, Kafka Connect, Kafka Streams, Confluent Schema Registry

Page 52: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

52Apache Kafka and Machine Learning

H2O.ai Model + Kafka Streams

Filter

Map

1) Create H2O ML model

2) Configure Kafka Streams Application

3) Apply H2O ML model to Streaming Data

4) Start Kafka Streams App

Page 53: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

53Apache Kafka and Machine Learning

End-to-End Stream Monitoring and Alerting

Confluent Control CenterData Stream Monitoring and AlertingMulti-cluster monitoring and management Kafka Connect Configuration

• Message delivery?• Delays? • Where got it stuck?• Lost messages?• Broker issues?• Performance?

http://docs.confluent.io/3.2.0/control-center/docs/monitoring.html

Page 54: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

54Apache Kafka and Machine Learning

Agenda

1) Machine Learning in the Real World2) Building an Analytic Model3) Applying an Analytic Model in Real Time4) Online Training of Models

Page 55: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

55Apache Kafka and Machine Learning

Let’s improve the analytic model

continuously…

Page 56: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

56Apache Kafka and Machine Learning

Analytical Pipeline

1. Data Access

2. Data Preparation

3. Exploratory Data Analysis

4. Model Building

5. Model Execution

6. Model Validation

7. Deployment

Online Training

Continuously train and improve the model with every new event

Page 57: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

57Apache Kafka and Machine Learning

Online Model Training of Analytic Models

How to improve models?

1.Manual Update

2.Automated Batch

3.Real Time

Page 58: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

58Apache Kafka and Machine Learning

STREAMING PLATFORM

BIG DATA ANALYTICS

FLUME

H2O.ai, Spark,

TensorFlow

HIVE

Kafka

Hadoop

Confluent Schema Registry

Kafka Streams

H2O.aiMesos

Kafka Streams

TensorFlow

Kubernetes

Avro

Avro

1) Get new Input Event via Kafka Topic

2) Improve Model inBig Data Cluster

3) Update deployed Modelvia Kafka Topic

4) LeverageImproved Modelfor new Events

Page 59: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

59Apache Kafka and Machine Learning

Caveats for Online Model Training

• Processes and infrastructure not ready

• Validation needed before production

• Slows down the system

• Only a few ML implementations supported

• Many use cases do not need it

Page 60: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

60Apache Kafka and Machine Learning

Key Take-Aways

Ø Insights are hidden in Historical Data on Big Data Platforms

Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models

Ø Streaming Platform uses these Models (without Redevelopment) to take Action in Real Time

Page 61: Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams

61Apache Kafka and Machine Learning

Kai WaehnerTechnology Evangelist

[email protected]@KaiWaehnerwww.kai-waehner.deLinkedIn

Questions? Feedback?Please contact me!