Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a...

21
Practical data science use cases in the 4 th industrial revolution Stefan Steffen

Transcript of Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a...

Page 1: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Practical data science use cases

in the 4 th industrial revolution

Stefan Steffen

Page 2: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

AGENDA

1. The critical context that defines our future landscape

• Exponential change in resources

• Acceleration in algorithms

• Speed of light limitations

2. Practical use cases to capitalize on this disruptive change

• Machine learning in telecoms

• Computer vision in retail customer experience

• Data science and IOT in smart farming

• Cassandra and Kafka in supply chain

PG 1

Page 3: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

The cr i t ical context that def ines our future landscape01

Page 4: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

From steam power to connected devices01

PG 3

1st Industrial Revolution

2nd Industrial Revolution

3rd Industrial Revolution

4th Industrial RevolutionThe advent of big data, connected devices, robotics and the move towards artificial intelligence fuels the 4th industrial revolution

Further automation of the production processby combining electronics and computersMass production drives

the 2nd industrial revolution with the help of electrical powerMechanisation of

production supported by water and steam power

i. Prof Klaus Schwab, The Fourth Industrial Revolution (WEF, 2016)

Page 5: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Exponential change in resources01

PG 4

Co

nn

ec

ted

de

vic

es

From 23bn in 2018 to 75bn in 2025

IHS: IOT platforms - Enabling the Internet of ThingsPro

life

ratio

n o

f d

ata 163Zb by 2025 (Zb = 1000bn Gb’s)

IDC: Data Age 2025Pro

ce

ssin

g p

ow

er Trillion fold increase in 60 years

https://pages.experts-exchange.com/processing-power-compared

Co

st o

f Sto

rag

e

Now less than R0.50 per Gb…

https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/

Page 6: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Accelerat ion in algor i thms01

PG 5

Machines

can see

Cloud Vision API

Machines

can listen

(speech recognition)

Machines

can read

Natural language

processing

Machines

can speak

Existor algorithm

Page 7: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Speed of l ight l imitat ions dr ive computing to the edge01

PG 6

Virtual reality

Our eyes produce over 5Gb

of data per eye per second

needing sub 7ms round trip

latency to avoid illness

Autonomous cars

5Tb of data generated per

day with requirement for

sub 5ms decisions

https://help.dyn.com/understanding-market-cities-and-cloud-performance/Derek Collison – The future of Edge Computing

Latency

30ms

Perceptible lag

once latency

exceeds

100-200ms

188ms

Page 8: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Pract ical use cases to capital ize on th is dis rupt ive change02

Page 9: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Overview of Machine Learning in telecoms02

PG 8

Network operations

- Network monitoring

- Fault detection

- Network optimisation

- Predictive maintenance

Product management

- Personalised marketing

- Product recommenders

- Churn prediction

- Dynamic discounting

Service delivery

- Service chatbots

- Voice interfaces

- Predictive identification of service failures

Support functions

- Revenue assurance

- Fraud detection

- Cyber security anomaly detection

Page 10: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Basic K-means cluster ing for customer segmentat ion02

PG 9

1. Randomly initialize the cluster centroids

2. Assign the data points to the closest

cluster centroid

3. Move the cluster centroid to the mean of

the assigned data points

4. Repeat until there is no change in the

clusters

5. Run K-means many times starting with

different random centroids

6. Examine the clusters distortion by

reviewing the sum of squared differences

Manu Jeevan: Possibly the simplest way to explain K-Means algorithm

0

Page 11: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Computer Vis ion in retai l customer exper ience02

PG 10

Page 12: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Art i f ic ial neural networks in pract ice02

PG 11

A quick overview

1. Neurons in the brain

2. Artificial neural network

3. Input to Output layers

4. Hidden layers

5. Filters for edges & shapes

6. Pattern detection

https://playground.tensorflow.org/Michael Nielsen: http://neuralnetworksanddeeplearning.com/index.html

Page 13: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Art i f ic ial neural networks in pract ice02

PG 11

A quick overview

1. Neurons in the brain

2. Artificial neural network

3. Input to Output layers

4. Hidden layers

5. Filters for edges & shapes

6. Pattern detection

https://playground.tensorflow.org/Michael Nielsen: http://neuralnetworksanddeeplearning.com/index.html

Page 14: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

IOT appl icat ions in Smart farming02

PG 12

Data driven decision making | Visualisation dashboard | Security

Edge orchestration layer that connects all end smart devices onto a common layer

Page 15: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Crop growth forecast ing us ing heat uni ts & soi l moisture02

PG 13

Crop growth stage requirements (GDD’s)

Weather API

Heat units Soil Moisture

Heat Units/Chill Units- One year historical- 10 days forecast

Growth maturity requirement reached

Soil Moisture trendsTemperature trendsDB

Sensor inputs1

2

3

Page 16: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Crop growth forecast ing us ing heat uni ts & soi l moisture02

PG 14

Single sine curve method100 day heat unit forecast

Acc

um

ula

ted

he

at u

nit

s (D

D)

Black line = actual heat unit accumulation from plant date to todayBlue line = Forecast with 80% and 95% confidence intervals

0

50

0

10

00

15

00

0 500 1000 1500

0 1400 1700 2000 2300 3200

Planting date

First bloom

Early bloom

Peak bloom

Physiological maturity

Harvest

Heat units accumulated after planting

Upper temp threshold

Lower temp threshold

Hour of the day

Tem

per

atu

re D

egre

e C

0 4 8 12 16 20 24

0

5

10

15

20

25

30

35

Paul W Brown: Heat Units – University of Arizona Cooperative Extension

Page 17: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Real-t ime supply chain s tock vis ibi l i ty02

Lack of stock visibility from the

Distribution Centre to Stores

Inefficient movement of stock through supply chain

Inadequate allocation of stock causing lower trade

Poor customer experience

Distribution Centre• Planning• Logistics

Multiple stores• Operations• Floor staff

Page 18: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

An archi tecture view of a s tock vis ibi l i ty deployment02

Various Users

Wrapped in

Event Capturing

Real-time NoSQL Data-Store

Dashboard Visualizations

Page 19: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Understanding the components02

Kubernetes overview

Old way: Applications on host New way: Deploy containers

Heavy-weight, non-portable. Relies on OS package manager.

Small and fast, portable. Uses OS-level virtualisation.

Kubernetes can be thought of as:• A container platform• A microservices platform• A portable cloud platform

Cassandra overviewApache Cassandra is a high performance, extremely scalable, fault tolerant (i.e. no single point of failure), distributed post-relational database solution.

Kafka overviewApache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records.• Store streams of records in a fault-tolerant way.• Process streams of records as they occur.

Kafka is generally used for two broad applications:• Building real-time streaming data pipelines that

reliably get data between systems or applications• Building real-time streaming applications that

transform or react to the streams of data

https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/Datastax Cassandra Tutorials: Apache Cassandra Overview

https://www.cloudera.com/documentation/kafka/1-2-x/topics/kafka.html

Page 20: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Data science appl icat ion in the 4 th Industr ial Revolut ion

PG 18

The critical context that defines our future landscapeAn exponential increase in resources, an acceleration in algorithms, and

speed of light limitations driving computing to the edge.

Page 21: Practical data science use cases in the 4th industrial ... · Kafka overview Apache Kafka is a distributed streaming platform. • Publish and subscribe to streams of records. •

Data science appl icat ion in the 4 th Industr ial Revolut ion

There is practical data science application in every business.

Make sure you capitalize on this disruptive change.