NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams

Post on 21-Jan-2018

120 views 1 download

Transcript of NDC London 2017 - The Data Dichotomy- Rethinking Data and Services with Streams

1

The Data Dichotomy: Rethinking data and services with streamsBen Stopford@benstopford

2

Build Features

Build for the Future

3

Evolution!

4

KAFKA

ServingLayer

(Cassandra etc.)

Kafka Streams / KSQL

Streaming Platforms

Data is embedded in each engine

High Throughput Messaging

Clustered Java App

5

authorization_attempts possible_fraud

Streaming Example

6

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

7

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

8

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

9

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

10

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

11

CREATE STREAM possible_fraud ASSELECT card_number, count(*)FROM authorization_attemptsWINDOW TUMBLING (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

authorization_attempts possible_fraud

12

Streaming == Manipulating Data in Flight

13

Business Applications

14

EcosystemsApp

Increasingly we build ecosystems

15

SOA / Microservices / EDA

CustomerService

ShippingService

16

The Problem is DATA

17

Most services share the same core facts.

Catalog

Most services live in here

18

OrdersService

PaymentsService

CustomersService

Data becomes spread out and we need to bring it together

Useful Grid

19

Service A Service B

Service C

One option is to share a database

20

Service A Service B

Service C

Databases provide a very rich form of coupling

21

Two different forces compete in our designs

22

Single Sign On Business Serviceauthorise(),

We are taught to encapsulate

LOOSE COUPLING!

23

But data systems have little to do

with encapsulation

24

Service Database

Data on inside

Data on outside

Data on inside

Data on outside

Interface hides data

Interface amplifies

data

Databases amplify the data they hold

25

The data dichotomyData systems are about exposing data.

Services are about hiding it.

26

Microservices shouldn’t

share a database!

27

Tension

We want all the good stuff which comes with a database.

We don’t want to share that database with anyone else.

But we do want to share datasets in a sensible way.

28

So how do we share data between services?

OrdersService

ShippingService

CustomerService

Webserver

29

Buying an iPad (with REST)

SubmitOrder

shipOrder() getCustomer()

OrdersService

ShippingService

CustomerService

Webserver

30

Buying an iPad with Events

Message Broker (Kafka)

Notification Data is replicated

(incrementally)

SubmitOrder

Order Created

Customer Updated

OrdersService

ShippingService

CustomerService

Webserver

KAFKA

31

Events for Notification Only

Message Broker (Kafka)

SubmitOrder

Order Created

getCustomer()REST

Notification

OrdersService

ShippingService

CustomerService

Webserver

KAFKA

32

Events for Data Locality

Customer Updated

SubmitOrder

Order Created

Data is replicated

OrdersService

ShippingService

CustomerService

Webserver

KAFKA

33

Events have two hats

Notification Data replication

34

Events are the key to scalable service ecosystems

35

Streaming is the toolset for dealing with events as they move!

36

Streaming Platform

The Log ConnectorsConnectors

Producer Consumer

Streaming Engine

37

Streaming Platform

The Log ConnectorsConnectors

Producer Consumer

Streaming Engine

38

What is a Distributed Log?

39

Shard on the way in

ProducingServices

Kafka

ConsumingServices

40

Each shard is a queue

ProducingServices

Kafka

ConsumingServices

41

Consumers share load

ProducingServices

Kafka

ConsumingServices

42

A log can Rewound and Replayed

Rewind & Replay

43

Compacted Log(retains only latest version)

Version 3

Version 2

Version 1

Version 2

Version 1

Version 5

Version 4

Version 3

Version 2

Version 1

44

Streaming Platform

The Log ConnectorsConnectors

Producer Consumer

Streaming Engine

45

Kafka Connect

KafkaConnect

KafkaConnect

Kafka

46

Streaming Platform

The Log ConnectorsConnectors

Producer Consumer

Streaming Engine

47

A database engine for data-in-flight

48

SELECT card_number, count(*)FROM authorization_attemptsWINDOW (SIZE 5 MINUTE)GROUP BY card_numberHAVING count(*) > 3;

Continuously Running Queries

49

Features: similar to database query engine

JoinFilterAggr-egate

View

Window

50

CompactedTopic

Join

Stream

Table

KafkaKafka Streams / KSQL

Topic

Join Streams and Tables

51

Handle Asynchronicity

In an asynchronous world, will the payment come first, or the order?

KAFKA

Buffer 5 mins

Join by Key

52

Handle Asynchronicity

KAFKA

Buffer 5 mins

Join by Key

KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”);

orders.join(payments, KeyValue::new, JoinWindows.of(1 * MIN)).peek((key, pair) -> emailer.sendMail(pair));

53

KAFKA

Join

A KTable is just a stream with infinite retention

54

A KTable is just a stream with infinite retention

KStream orders = builder.stream(“Orders”); KStream payments = builder.stream(“Payments”);KTable customers = builder.table(“Customers”);

orders.join(payments, EmailTuple::new, JoinWindows.of(1*MIN)).join(customers, (tuple, cust) -> tuple.setCust(cust))

.peek((key, tuple) -> emailer.sendMail(tuple));

Materialize a table in two lines of code!

55

KAFKA

Emailer

With KSQL and Node.js

Create stream ToEmailFrom Orders, Payment,Customer where …

56

Scales Out

57

Streaming is about

1. Processing data incrementally

2. Moving data to where it needs to be processed (quickly and efficiently)

On Notification

Data Replication

58

Steps to Streaming Services

59

1. Take Responsibility for the past and evolve

60

Stay Simple. Take Responsibility for the past

Browser

Webserver

61

Evolve Forwards

Browser

WebserverOrdersService

62

2. Raise events. Don’t talk to services.

63

Raise events. Don’t talk to services

Browser

WebserverOrdersService

64

KAFKA

Order RequestedOrder

Received

Browser

Webserver

OrdersService

Raise events. Don’t talk to services

65

KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Raise events. Don’t talk to services

66

KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Use Kafka as a Backbone for Events

67

3. Use Connect (& CDC) to evolve away from legacy

68KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Evolve away from Legacy

69KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Use the Database as a ‘Seam’

Connect

Products

70

4. Make use of Schemas

71KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Schemas are your API

Connect

ProductsSchema Registry

72

5. Use the Single Writer Principal

73KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

Apply the single writer principal

Connect

ProductsSchema Registry

Order Completed

74

Orders Service

EmailService

T1 T2

T3

T4

RESTService

T5

Single Writer Principal

75

Single Writer Principal

- Creates local consistency points in the absence of Global Consistency

- Makes schema upgrades easier to manage.

76

6. Store Datasets in the Log

77

Messaging that Remembers

Orders Customers

PaymentsStock

78KAFKA

Order Requested

Order Validated

Order Received

Browser

Webserver

OrdersService

New Service, No Problem!

Connect

Products

Schema Registry

Order Completed Repricing

79

Orders Customers

PaymentsStock

Single, Shared Source of Truth

80

But how do you query a log?

81

7. Move Data to Code

82

83

Connect

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService Stock

Stock

Materialize Stock ‘View’ Inside Service

KAFKA

84

Connect

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService Stock

Stock

Take only the data we need

KAFKA

85

Data Movement

Be realistic:• Network is no longer the bottleneck• Indexing is:

• In memory indexes help• Keep datasets focused

86

8. Use the log as a ‘database’

87

Connect

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService

Reserved Stocks

Stock

Stock

Reserved Stocks

Apply Event Sourcing

KAFKA

Table

88

Connect

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService

Reserved Stocks

Stock

Stock

Reserved Stocks

Order Service Loads Reserved Stocks on Startup

KAFKA

89

Kafka has several features for reducing the need to move data on startup

- Standby Replicas- Disk Checkpoints- Compacted topics

90

9. Use Transactions to tie All Interactions Together

91

OrderRequested(IPad)

2a. Order Validated

2c. Offset Commit2b. IPad Reserved

Internal State:Stock = 17Reservations = 2

Tie Events & State with Transactions

92

Connect

TRANSACTION

Order Requested

Order Validated

Order Completed

Order Received

Products

Browser

Webserver

Schema Registry

OrdersService

Reserved Stocks

Stock

Stock

Reserved Stocks

Transactions

KAFKA

93

10. Bridge the Sync/Async Divide with a Streaming Ecosystem

94

POST

GET

Load

Ba

lance

r

ORDE

RSOR

DERS

OV T

OPIC

Order ValidationsKAFKA

INVENTORY

Orders

Inventory

Fraud Service

Order DetailsService

InventoryService

(see previous figure)

Order Created

Order Validated

Orders View

Q in CQRS

Orders ServiceC is CQRS

Services in the Micro: Orders ServiceFind the code online!

95

Orders Customers

Payments Stock

Each service is optimized for autonomy

A Database Inside Out

HISTORICAL EVENT STREAMS

96

Kafka

KAFKA

New York

Tokyo

London

Global / Disconnected Ecosystems

97

So…

98

Good architectures have little to do with this:

99

It’s about how systems evolves over time

100

Request driven isn’t enough

• High coupling• Hard to handle

async flows• Hard to move and

join datasets.

101

Leverage the Duality of Events

Notification Data replication

102

With a toolset built for data in flight

103

The data dichotomyData systems are about exposing data.

Services are about hiding it.

Remember the data dichotomy

104

The Data Dichotomy

We want all the good stuff which comes with a database.

We don’t want to share that database with anyone else.

But we do want to share datasets in a sensible way.

105

• Broadcast events• Retain them in the log• Compose streaming functions• Recasting the event stream into

views when you need to query.

Event Driven Services

106

Services built on a Streaming

Platform

107

Thank You@benstopford

Blog Series: https://www.confluent.io/blog/tag/microservices/Code: https://github.com/confluentinc/kafka-streams-examples