Intro to Akka Streams

88
streams

Transcript of Intro to Akka Streams

Page 1: Intro to Akka Streams

streams

Page 2: Intro to Akka Streams

Agenda

• Reactive Streams

• Why Akka Streams?

• API Overview

Page 3: Intro to Akka Streams

Reactive Streams

Page 4: Intro to Akka Streams

public interface Publisher<T> { public void subscribe(Subscriber<? super T> s);}

public interface Subscriber<T> {

public void onSubscribe(Subscription s);

public void onNext(T t);

public void onError(Throwable t);

public void onComplete();}

public interface Processor<T, R> extends Subscriber<T>, Publisher<R> {}

public interface Subscription {

public void request(long n);

public void cancel();}

Reactive Streams

Page 5: Intro to Akka Streams

A standardised spec/contract to achieve asynchronous

back-pressured stream processing.

Page 6: Intro to Akka Streams

Standardised ?

Gives us consistent interop between libraries and platforms that implement this spec.

Page 7: Intro to Akka Streams
Page 8: Intro to Akka Streams

everything is async & back-pressured

Page 9: Intro to Akka Streams

Reactive Streams

Stream API Stream API Stream API

Page 10: Intro to Akka Streams

Reactive Streams

Stream API Stream API Stream API

Users use this API

Page 11: Intro to Akka Streams

Reactive Streams

Stream API Stream API Stream API

Users use this API

Library authors use this API

Page 12: Intro to Akka Streams

Async?

Page 13: Intro to Akka Streams

• We know async IO from last week

• But there are other types of async operations, that cross over different async boundaries

• between applications

• between threads

• and over the network as we saw

Page 14: Intro to Akka Streams

Back-Pressured ?

Page 15: Intro to Akka Streams

Publisher[T] Subscriber[T]

Page 16: Intro to Akka Streams

Think abstractly about these lines.

“async boundary”

This can be the network, or threads on the same CPU.

Publisher[T] Subscriber[T]

Page 17: Intro to Akka Streams

What problem are we trying to solve?

Discrepancy in the rate of processing

• Fast Publisher / Slow Subscriber

• Slow Publisher / Fast Subscriber

Page 18: Intro to Akka Streams

Push Model

Page 19: Intro to Akka Streams

Publisher[T] Subscriber[T]

100 messages / 1 second

1 message / 1second

Fast Slow

Page 20: Intro to Akka Streams

Publisher[T] Subscriber[T]

Page 21: Intro to Akka Streams

Publisher[T] Subscriber[T]

drop overflowedrequire resending

Page 22: Intro to Akka Streams

Publisher[T] Subscriber[T]

has to keep trackof messages to resendnot safe & complicated

Page 23: Intro to Akka Streams

NACK ?

Page 24: Intro to Akka Streams

Publisher[T] Subscriber[T]

Page 25: Intro to Akka Streams

Publisher[T] Subscriber[T]

stop!

Page 26: Intro to Akka Streams

Publisher[T] Subscriber[T]

stop!

Page 27: Intro to Akka Streams

Publisher[T] Subscriber[T]

stop!

sh#t!

Page 28: Intro to Akka Streams

Publisher[T] Subscriber[T]

publisher didn’t receive NACK in timeso we lost that last message

not safe

Page 29: Intro to Akka Streams

Pull ?

Page 30: Intro to Akka Streams

Publisher[T] Subscriber[T]

100 messages / 1 second

1 message / 1second

FastSlow

Page 31: Intro to Akka Streams

Publisher[T] Subscriber[T]

gimme!

Page 32: Intro to Akka Streams

Publisher[T] Subscriber[T]

gimme!

Page 33: Intro to Akka Streams

Publisher[T] Subscriber[T]

Page 34: Intro to Akka Streams

Publisher[T] Subscriber[T]

gimme!

Page 35: Intro to Akka Streams

Publisher[T] Subscriber[T]

gimme!

Page 36: Intro to Akka Streams

Publisher[T] Subscriber[T]

gimme!

Page 37: Intro to Akka Streams

Publisher[T] Subscriber[T]

gimme!

Page 38: Intro to Akka Streams

Publisher[T] Subscriber[T]

gimme!

Page 39: Intro to Akka Streams

Publisher[T] Subscriber[T]

gimme!

Page 40: Intro to Akka Streams

• Spam!

• Redundant messaging -> flooding the connection

• No buffer/batch support

Page 41: Intro to Akka Streams

A different approach

Page 42: Intro to Akka Streams

We have to take into account the following scenarios:

• Fast Pub / Slow Sub

• Slow Pub / Fast Sub

Which can happen dynamically

Page 43: Intro to Akka Streams

Publisher[T] Subscriber[T]

Data

Demand(n)

Page 44: Intro to Akka Streams

Publisher[T] Subscriber[T]

Data

Demand(n)

Dynamic Push/Pull

bounded buffers with no overflowdemand can be accumulated

batch processing -> performance

Page 45: Intro to Akka Streams

• Cool let’s implement this using Actors!

• We can, it’s possible … but should it be done ?

Page 46: Intro to Akka Streams

The problem(s) with Akka Actors

Page 47: Intro to Akka Streams

Type Safety

Any => Unit

Page 48: Intro to Akka Streams

Composition

In FP this makes us warm and fuzzyval f: A => Bval g: B => C

val h: A => C = f andThen g

Page 49: Intro to Akka Streams

• Using Actors?

• An Actor is aware of who sent it messages and where it must forward/reply them.

• No compositionality without thinking about it explicitly.

Page 50: Intro to Akka Streams

Data Flow

• What are streams ? Flows of data.

• Imagine a 10 stage data pipeline you want to model

• Now imagine writing that in Actors.

Page 51: Intro to Akka Streams
Page 52: Intro to Akka Streams

• Following the flow of data in Actors requires jumping around all over the code base

• Low level, error prone and hard to reason about

Page 53: Intro to Akka Streams

Akka Streams APIbuilding blocks

Page 54: Intro to Akka Streams

Design Philosophy

• Everything we will cover now are blueprints that describe the actions/effects they perform.

• Reusability

• Compositionality

Page 55: Intro to Akka Streams

• “Design your program with a pure functional core,push side-effects to the end of the world and detonate to execute.

- some guy on stackoverflow

Page 56: Intro to Akka Streams

• Publisher of data

• Exactly one output

Image from boldradius.com

Page 57: Intro to Akka Streams

val singleSrc = Source.single(1)

val iteratorSrc = Source.fromIterator(() => Iterator from 0)

val futureSrc = Source.fromFuture(Future("abc"))

val collectionSrc = Source(List(1,2,3))

val tickSrc = Source.tick(initialDelay = 1 second,

interval = 1 second,tick = "tick-tock")

val requestSource = req.entity.dataBytes

Page 58: Intro to Akka Streams

• Subscriber (consumer) of data

• Describes where the data in our stream will go.

• Exactly one input

Image from boldradius.com

Page 59: Intro to Akka Streams

Sink.head

Sink.reduce[Int]((a, b) => a + b)

Sink.fold[Int, Int](0)(_ + _)

Sink.foreach[String](println)

FileIO.toPath(Paths.get("file.txt"))

Page 60: Intro to Akka Streams

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

Page 61: Intro to Akka Streams

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

Input type

Page 62: Intro to Akka Streams

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

Materialized type

Page 63: Intro to Akka Streams

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

Materialized type

Available when the stream ‘completes’

Page 64: Intro to Akka Streams

val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)

val futureRes: Future[Int] = Source(1 to 10).runWith(fold)

futureRes.foreach(println)

// 55

Page 65: Intro to Akka Streams

So I can get data from somewhere

and I can put data somewhere else.

But I want to do something with it.

Page 66: Intro to Akka Streams

• A processor of data

• Has one input and one output

Image from boldradius.com

Page 67: Intro to Akka Streams

val double: Flow[Int, Int, NotUsed] = Flow[Int].map(_ * 2)

Page 68: Intro to Akka Streams

val src = Source(1 to 10)

val double = Flow[Int].map(_ * 2)

val negate = Flow[Int].map(_ * -1)

val print = Sink.foreach[Int](println)

val graph = src via double via negate to print

graph.run()

-2-4-6-8-10-12-14-16-18-20

Page 69: Intro to Akka Streams

• Flow is immutable, thread-safe, and thus freely shareable

Page 70: Intro to Akka Streams

• Are Linear flows enough ?

• No, we want to be able to describe arbitrarilly complex steps in our pipelines

Page 71: Intro to Akka Streams

Graphs

Page 72: Intro to Akka Streams

Flow

Page 73: Intro to Akka Streams

Graph

Page 74: Intro to Akka Streams

• We define multiple linear flows and then use the Graph DSL to connect them.

• We can combine multiple streams - fan in

• Split a stream into substreams - fan out

Page 75: Intro to Akka Streams

Fan-Out

Page 76: Intro to Akka Streams

Fan-In

Page 77: Intro to Akka Streams

A little example

Page 78: Intro to Akka Streams

Some sort of video uploading service

- Stream in video- Process it

- Store it

Page 79: Intro to Akka Streams

bcast

ByteStringConvert toArray[Byte]

flowbcast

Process HighRes flow

Process LowRes flow

Process MedRes flow

sink

sink

sink

Page 80: Intro to Akka Streams

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Our custom Sink

Page 81: Intro to Akka Streams

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Has one input of type ByteString

Page 82: Intro to Akka Streams

Takes 3 Sinks, which can be Files, DBs, etc.

Has one input of type ByteString

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Page 83: Intro to Akka Streams

Describes 3 processing stagesThat are Flows of Array[Byte] => ByteString

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Has one input of type ByteString

Takes 3 Sinks, which can be Files, DBs, etc.

Page 84: Intro to Akka Streams

Describes 3 processing stagesThat are Flows of Array[Byte] => ByteString

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Has one input of type ByteString

Emits result to the 3 Sinks

Takes 3 Sinks, which can be Files, DBs, etc.

Page 85: Intro to Akka Streams

Has a type of:Sink[ByteString, (Future[IOResult], Future[IOResult], Future[IOResult])]

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Page 86: Intro to Akka Streams

Sink[ByteString, (Future[IOResult], Future[IOResult], Future[IOResult])]

Materialized values

Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._

val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]

bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink

SinkShape(bcastInput.in) }})

Page 87: Intro to Akka Streams

Things we didn’t have time for

• Integrating with Actors

• Buffering and throttling streams

• Defining custom Graph shapes and stages

Page 88: Intro to Akka Streams

Thanks for listening!