Strata Software Architecture NY: The Data Dichotomy

82
The Data Dichotomy: Rethinking Data and Services with Streams Ben Stopford @benstopford

Transcript of Strata Software Architecture NY: The Data Dichotomy

Page 1: Strata Software Architecture NY: The Data Dichotomy

The Data Dichotomy: Rethinking Data and Services

with Streams Ben Stopford @benstopford

Page 2: Strata Software Architecture NY: The Data Dichotomy
Page 3: Strata Software Architecture NY: The Data Dichotomy

What are microservices really about?

Page 4: Strata Software Architecture NY: The Data Dichotomy

GUI

UI Service

Orders Service

Returns Service

Fulfilment Service

Payment Service

Stock Service

Splitting the Monolith? Single Process /Code base

/Deployment

Many Processes /Code bases

/Deployments

Page 5: Strata Software Architecture NY: The Data Dichotomy

Orders Service

Autonomy?

Page 6: Strata Software Architecture NY: The Data Dichotomy

Orders Service

Stock Service

Email Service

Fulfillment Service

Independently Deployable

Independently Deployable

Independently Deployable

Independently Deployable

Page 7: Strata Software Architecture NY: The Data Dichotomy

Independence is where services get

their value

Page 8: Strata Software Architecture NY: The Data Dichotomy

Allows Scaling

Page 9: Strata Software Architecture NY: The Data Dichotomy

Scaling in terms of people

What happens when we grow?

Page 10: Strata Software Architecture NY: The Data Dichotomy

Companies are inevitably a collection of applications They must work together to some degree

Page 11: Strata Software Architecture NY: The Data Dichotomy

Interconnection is an afterthought

FTP / Enterprise Messaging etc

Page 12: Strata Software Architecture NY: The Data Dichotomy

Internal World

External world

External world

External World

External world

Service Boundary

Services Force Us To Consider The External World

Page 13: Strata Software Architecture NY: The Data Dichotomy

Internal World

External world

External world

External World

External world

Service Boundary

External World is something we should Design For

Page 14: Strata Software Architecture NY: The Data Dichotomy

But this independence

comes at a cost $$$

Page 15: Strata Software Architecture NY: The Data Dichotomy

Orders Object

Statement Object getOpenOrders(),

Less Surface Area = Less Coupling

Encapsulate State

Encapsulate Functionality

Consider Two Objects in one address space

Encapsulation => Loose Coupling

Page 16: Strata Software Architecture NY: The Data Dichotomy

Change 1 Change 2

Redeploy

Singly-deployable apps are easy

Page 17: Strata Software Architecture NY: The Data Dichotomy

Orders Service Statement Service getOpenOrders(),

Independently Deployable Independently Deployable

Synchronized changes are painful

Page 18: Strata Software Architecture NY: The Data Dichotomy

Services work best where requirements are isolated in a single

bounded context

Page 19: Strata Software Architecture NY: The Data Dichotomy

Single Sign On Business Service authorise(),

It’s unlikely that a Business Service would need the internal SSO state / function to change

Single Sign On

Page 20: Strata Software Architecture NY: The Data Dichotomy

SSO has a tightly bounded context

Page 21: Strata Software Architecture NY: The Data Dichotomy

But business services are

different

Page 22: Strata Software Architecture NY: The Data Dichotomy

Catalog Authorisation

Most business services share the same core stream of facts.

Most services live in here

Page 23: Strata Software Architecture NY: The Data Dichotomy

The futures of business services

are far more tightly intertwined.

Page 24: Strata Software Architecture NY: The Data Dichotomy

We need encapsulation to hide internal state. Be loosely coupled.

But we need the freedom to slice & dice shared data like any other

dataset

Page 25: Strata Software Architecture NY: The Data Dichotomy

But data systems have little to do

with encapsulation

Page 26: Strata Software Architecture NY: The Data Dichotomy

Service Database

Data on inside

Data on outside

Data on inside

Data on outside

Interface hides data

Interface amplifies

data

Databases amplify the data they hold

Page 27: Strata Software Architecture NY: The Data Dichotomy

The data dichotomy Data systems are about exposing data.

Services are about hiding it.

Page 28: Strata Software Architecture NY: The Data Dichotomy

Service Database

Data on inside

Data on outside

Data on inside

Data on outside

Encapsulation protects us

against future change.

Databases provide ultimate flexibility for the

now

These approaches serve different goals

Page 29: Strata Software Architecture NY: The Data Dichotomy

This affects services in one of

two ways

Page 30: Strata Software Architecture NY: The Data Dichotomy

getOpenOrders( fulfilled=false,

deliveryLocation=CA, orderValue=100,

operator=GreatherThan)

Either (1) we constantly add to the interface, as datasets grow

Page 31: Strata Software Architecture NY: The Data Dichotomy

getOrder(Id)

getOrder(UserId)

getAllOpenOrders()

getAllOrdersUnfulfilled(Id)

getAllOrders()

(1) Services can end up looking like kookie home-grown databases

Page 32: Strata Software Architecture NY: The Data Dichotomy

...and DATA amplifies this “God Service”

problem

Page 33: Strata Software Architecture NY: The Data Dichotomy

(2) Give up and move whole datasets en masse

getAllOrders()

Data Copied Internally

Page 34: Strata Software Architecture NY: The Data Dichotomy

This leads to a different problem

Page 35: Strata Software Architecture NY: The Data Dichotomy

Data diverges over time

Orders Service

Stock Service

Email Service

Fulfill- ment Service

Page 36: Strata Software Architecture NY: The Data Dichotomy

The more mutable copies,

the more data will diverge over time

Page 37: Strata Software Architecture NY: The Data Dichotomy

Nice neat services

Service contract too

limiting

Can we change both services

together, easily? NO

Broaden Contract

Eek $$$

NO

YES

Is it a shared

Database?

Frack it! Just give me ALL the data

Data diverges. (many different

versions of the same facts) Lets encapsulate

Lets centralise to one copy

YES

Start here

Cycle of inadequacy:

Page 38: Strata Software Architecture NY: The Data Dichotomy

These forces compete in the systems we build

Divergence

Accessibility

Encapsulation

Page 39: Strata Software Architecture NY: The Data Dichotomy

Shared database

Service Interfaces

Better Accessibility

Better Independence

No perfect solution

Page 40: Strata Software Architecture NY: The Data Dichotomy

Is there a better way?

Page 41: Strata Software Architecture NY: The Data Dichotomy

Data on the Inside

Data on the Outside

Data on the Outside

Data on the Outside

Data on the Outside

Service Boundary

Page 42: Strata Software Architecture NY: The Data Dichotomy

Make data-on-the-outside a 1st class citizen

Page 43: Strata Software Architecture NY: The Data Dichotomy

Separate Reads & Writes with Immutable Streams

Page 44: Strata Software Architecture NY: The Data Dichotomy

Orders Service

Stock Service

Kafka

Fulfilment Service

UI Service

Fraud Service

Event Driven + CQRS + Stateful Stream Processing

Commands go to owning services

Queries are interpreted entirely inside in each

bounded context

Page 45: Strata Software Architecture NY: The Data Dichotomy

Event Driven + CQRS + Stateful Stream Processing

Orders

Service

Stock

Service

Orders

Fulfilment

Service

UI

Service

Fraud

Service

•  Events form a central Shared Narrative

•  Writes/Commands => Go to “owning” service

•  Reads/Queries => Wholly owned by each Service

•  Stream processing toolkit

Page 46: Strata Software Architecture NY: The Data Dichotomy

Kafka is a Streaming Platform

Page 47: Strata Software Architecture NY: The Data Dichotomy

Kafka: a Streaming Platform

The Log Connectors Connectors

Producer Consumer

Streaming Engine

Page 48: Strata Software Architecture NY: The Data Dichotomy

The Log

Page 49: Strata Software Architecture NY: The Data Dichotomy

Kafka is a type of messaging system

Writers Kafka Readers

Important Properties: •  Petabytes of data •  Scales out linearly •  Built in fault tolerance •  Log can be replayed

Page 50: Strata Software Architecture NY: The Data Dichotomy

Scaling becomes a concern of the broker, not the

service

Page 51: Strata Software Architecture NY: The Data Dichotomy

Services Naturally Load Balance

Page 52: Strata Software Architecture NY: The Data Dichotomy

This provides Fault Tolerance

Page 53: Strata Software Architecture NY: The Data Dichotomy

Reset to any point in the shared narrative

Rewind & Replay

Page 54: Strata Software Architecture NY: The Data Dichotomy

Compacted Log (retains only latest version)

Version 3

Version 2

Version 1

Version 2

Version 1

Version 5

Version 4

Version 3

Version 2

Version 1

Page 55: Strata Software Architecture NY: The Data Dichotomy

Service Backbone Scalable, Fault Tolerant, Concurrent, Strongly Ordered, Retentive

The Log Connectors Connectors

Producer Consumer

Streaming Engine

Page 56: Strata Software Architecture NY: The Data Dichotomy

A place to keep the data-on-the-outside as an immutable

narrative

Page 57: Strata Software Architecture NY: The Data Dichotomy

Now add Stream Processing

Page 58: Strata Software Architecture NY: The Data Dichotomy

Kafka’s Streaming API A general, embeddable streaming engine

The Log Connectors Connectors

Producer Consumer

Streaming Engine

Page 59: Strata Software Architecture NY: The Data Dichotomy

What is Stream Processing?

A machine for combining and processing streams of events

Page 60: Strata Software Architecture NY: The Data Dichotomy

Features: similar to database query engine

Join Filter Aggr- egate

View

Window

Page 61: Strata Software Architecture NY: The Data Dichotomy

Stateful Stream Processing adds Tables & Local State

stream

Compacted stream

Join

Stream Data

Stream-Tabular Data

Domain Logic

Tables /

Views

Kafka

Event Driven Service

Page 62: Strata Software Architecture NY: The Data Dichotomy

Avg(o.time – p.time) From orders, payment, user Group by user.region over 1 day window emitting every second

Stateful Stream Processing

Streams

Stream Processing Engine

Derived “Table”

Table Domain Logic

Page 63: Strata Software Architecture NY: The Data Dichotomy

Tools make it easy to join streams of events (or tables of data) from

different services

Email Service

Legacy App Orders Payments Stock

KSTREAMS

Kafka

Page 64: Strata Software Architecture NY: The Data Dichotomy

So we might join two event streams and create a local

store in our own Domain Model

Orders Service

Payment Events

Purchase Events

PurchasePayments

Page 65: Strata Software Architecture NY: The Data Dichotomy

Make the projections queryable

Orders Service

Kafka

Queryable State API

Context Boundary

Page 66: Strata Software Architecture NY: The Data Dichotomy

Blend the concepts of services communicating and services sharing state

customer orders

catalogue Pay-

ments

Page 67: Strata Software Architecture NY: The Data Dichotomy

Very different to sharing a database

customer orders

catalogue Pay-

ments

Services communicate and share data via the log

Slicing & Dicing is embedded in each bounded context

Page 68: Strata Software Architecture NY: The Data Dichotomy

But sometimes you have to move data

Page 69: Strata Software Architecture NY: The Data Dichotomy

Easy to drop out to an external database if needed

DB of choice

Kafka Connect

API Fraud Service

Context Boundary

Page 70: Strata Software Architecture NY: The Data Dichotomy

Oracle

Connector API & Open Source Ecosystem make this easier

The Log

Connector Connector

Producer Consumer

Streaming Engine

Cassandra

Page 71: Strata Software Architecture NY: The Data Dichotomy

WIRED Principals •  Wimpy: Start simple, lightweight and fault tolerant.

•  Immutable: Build a retentive, shared narrative.

•  Reactive: Leverage Asynchronicity. Focus on the now.

•  Evolutionary: Use only the data you need today.

•  Decentralized: Receiver driven. Coordination avoiding. No

God services

Page 72: Strata Software Architecture NY: The Data Dichotomy

So…

Page 73: Strata Software Architecture NY: The Data Dichotomy

Architecture is really about designing for change

Page 74: Strata Software Architecture NY: The Data Dichotomy

The data dichotomy Data systems are about exposing data.

Services are about hiding it.

Remember:

Page 75: Strata Software Architecture NY: The Data Dichotomy

Shared data is a reality for most organisations

Catalog

Page 76: Strata Software Architecture NY: The Data Dichotomy

Embrace data that lives and flows between services

Page 77: Strata Software Architecture NY: The Data Dichotomy

Canonical Immutable Streams

Embed the ability to slice and dice into each bounded context

Distributed Log Event Driven

Service

Page 78: Strata Software Architecture NY: The Data Dichotomy

Shared database

Service Interfaces

Better Accessibility

Better Independence

Balance the Data Dichotomy

Page 79: Strata Software Architecture NY: The Data Dichotomy

Create a canonical shared narrative

Page 80: Strata Software Architecture NY: The Data Dichotomy

Use stream processing to handle data-on-the-

outside whilst prioritizing the NOW

Page 81: Strata Software Architecture NY: The Data Dichotomy

Event Driven Services

The Log (streams & tables)

Ingestion Services

Services with Polyglotic

persistence

Simple Services

Streaming Services

Page 82: Strata Software Architecture NY: The Data Dichotomy

Twitter: @benstopford

Blog of this talk at http://confluent.io/blog