Intro to Akka Streams
-
Upload
michael-kendra -
Category
Engineering
-
view
108 -
download
5
Transcript of Intro to Akka Streams
streams
Agenda
• Reactive Streams
• Why Akka Streams?
• API Overview
Reactive Streams
public interface Publisher<T> { public void subscribe(Subscriber<? super T> s);}
public interface Subscriber<T> {
public void onSubscribe(Subscription s);
public void onNext(T t);
public void onError(Throwable t);
public void onComplete();}
public interface Processor<T, R> extends Subscriber<T>, Publisher<R> {}
public interface Subscription {
public void request(long n);
public void cancel();}
Reactive Streams
A standardised spec/contract to achieve asynchronous
back-pressured stream processing.
Standardised ?
Gives us consistent interop between libraries and platforms that implement this spec.
everything is async & back-pressured
Reactive Streams
Stream API Stream API Stream API
Reactive Streams
Stream API Stream API Stream API
Users use this API
Reactive Streams
Stream API Stream API Stream API
Users use this API
Library authors use this API
Async?
• We know async IO from last week
• But there are other types of async operations, that cross over different async boundaries
• between applications
• between threads
• and over the network as we saw
Back-Pressured ?
Publisher[T] Subscriber[T]
Think abstractly about these lines.
“async boundary”
This can be the network, or threads on the same CPU.
Publisher[T] Subscriber[T]
What problem are we trying to solve?
Discrepancy in the rate of processing
• Fast Publisher / Slow Subscriber
• Slow Publisher / Fast Subscriber
Push Model
Publisher[T] Subscriber[T]
100 messages / 1 second
1 message / 1second
Fast Slow
Publisher[T] Subscriber[T]
Publisher[T] Subscriber[T]
drop overflowedrequire resending
Publisher[T] Subscriber[T]
has to keep trackof messages to resendnot safe & complicated
NACK ?
Publisher[T] Subscriber[T]
Publisher[T] Subscriber[T]
stop!
Publisher[T] Subscriber[T]
stop!
Publisher[T] Subscriber[T]
stop!
sh#t!
Publisher[T] Subscriber[T]
publisher didn’t receive NACK in timeso we lost that last message
not safe
Pull ?
Publisher[T] Subscriber[T]
100 messages / 1 second
1 message / 1second
FastSlow
Publisher[T] Subscriber[T]
gimme!
Publisher[T] Subscriber[T]
gimme!
Publisher[T] Subscriber[T]
Publisher[T] Subscriber[T]
gimme!
Publisher[T] Subscriber[T]
gimme!
Publisher[T] Subscriber[T]
gimme!
Publisher[T] Subscriber[T]
gimme!
Publisher[T] Subscriber[T]
gimme!
Publisher[T] Subscriber[T]
gimme!
• Spam!
• Redundant messaging -> flooding the connection
• No buffer/batch support
A different approach
We have to take into account the following scenarios:
• Fast Pub / Slow Sub
• Slow Pub / Fast Sub
Which can happen dynamically
Publisher[T] Subscriber[T]
Data
Demand(n)
Publisher[T] Subscriber[T]
Data
Demand(n)
Dynamic Push/Pull
bounded buffers with no overflowdemand can be accumulated
batch processing -> performance
• Cool let’s implement this using Actors!
• We can, it’s possible … but should it be done ?
The problem(s) with Akka Actors
Type Safety
Any => Unit
Composition
In FP this makes us warm and fuzzyval f: A => Bval g: B => C
val h: A => C = f andThen g
• Using Actors?
• An Actor is aware of who sent it messages and where it must forward/reply them.
• No compositionality without thinking about it explicitly.
Data Flow
• What are streams ? Flows of data.
• Imagine a 10 stage data pipeline you want to model
• Now imagine writing that in Actors.
• Following the flow of data in Actors requires jumping around all over the code base
• Low level, error prone and hard to reason about
Akka Streams APIbuilding blocks
Design Philosophy
• Everything we will cover now are blueprints that describe the actions/effects they perform.
• Reusability
• Compositionality
• “Design your program with a pure functional core,push side-effects to the end of the world and detonate to execute.
- some guy on stackoverflow
val singleSrc = Source.single(1)
val iteratorSrc = Source.fromIterator(() => Iterator from 0)
val futureSrc = Source.fromFuture(Future("abc"))
val collectionSrc = Source(List(1,2,3))
val tickSrc = Source.tick(initialDelay = 1 second,
interval = 1 second,tick = "tick-tock")
val requestSource = req.entity.dataBytes
• Subscriber (consumer) of data
• Describes where the data in our stream will go.
• Exactly one input
Image from boldradius.com
Sink.head
Sink.reduce[Int]((a, b) => a + b)
Sink.fold[Int, Int](0)(_ + _)
Sink.foreach[String](println)
FileIO.toPath(Paths.get("file.txt"))
val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
Input type
val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
Materialized type
val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
Materialized type
Available when the stream ‘completes’
val fold: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
val futureRes: Future[Int] = Source(1 to 10).runWith(fold)
futureRes.foreach(println)
// 55
So I can get data from somewhere
and I can put data somewhere else.
But I want to do something with it.
• A processor of data
• Has one input and one output
Image from boldradius.com
val double: Flow[Int, Int, NotUsed] = Flow[Int].map(_ * 2)
val src = Source(1 to 10)
val double = Flow[Int].map(_ * 2)
val negate = Flow[Int].map(_ * -1)
val print = Sink.foreach[Int](println)
val graph = src via double via negate to print
graph.run()
-2-4-6-8-10-12-14-16-18-20
• Flow is immutable, thread-safe, and thus freely shareable
• Are Linear flows enough ?
• No, we want to be able to describe arbitrarilly complex steps in our pipelines
Graphs
Flow
Graph
• We define multiple linear flows and then use the Graph DSL to connect them.
• We can combine multiple streams - fan in
• Split a stream into substreams - fan out
Fan-Out
Fan-In
A little example
Some sort of video uploading service
- Stream in video- Process it
- Store it
bcast
ByteStringConvert toArray[Byte]
flowbcast
Process HighRes flow
Process LowRes flow
Process MedRes flow
sink
sink
sink
Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._
val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]
bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink
SinkShape(bcastInput.in) }})
Our custom Sink
Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._
val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]
bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink
SinkShape(bcastInput.in) }})
Has one input of type ByteString
Takes 3 Sinks, which can be Files, DBs, etc.
Has one input of type ByteString
Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._
val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]
bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink
SinkShape(bcastInput.in) }})
Describes 3 processing stagesThat are Flows of Array[Byte] => ByteString
Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._
val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]
bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink
SinkShape(bcastInput.in) }})
Has one input of type ByteString
Takes 3 Sinks, which can be Files, DBs, etc.
Describes 3 processing stagesThat are Flows of Array[Byte] => ByteString
Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._
val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]
bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink
SinkShape(bcastInput.in) }})
Has one input of type ByteString
Emits result to the 3 Sinks
Takes 3 Sinks, which can be Files, DBs, etc.
Has a type of:Sink[ByteString, (Future[IOResult], Future[IOResult], Future[IOResult])]
Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._
val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]
bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink
SinkShape(bcastInput.in) }})
Sink[ByteString, (Future[IOResult], Future[IOResult], Future[IOResult])]
Materialized values
Sink.fromGraph(GraphDSL.create(highRes, mediumRes, lowRes)((_, _, _){ implicit b => (highSink, mediumSink, lowSink) => { import GraphDSL.Implicits._
val bcastInput = b.add(Broadcast[ByteString](1)) val bcastRawBytes = b.add(Broadcast[Array[Byte]](3)) val processHigh: Flow[Array[Byte], ByteString, NotUsed] val processMedium: Flow[Array[Byte], ByteString, NotUsed] val processLow: Flow[Array[Byte], ByteString, NotUsed]
bcastInput.out(0) ~> byteAcc ~> bcastRawBytes ~> processHigh ~> highSink bcastRawBytes ~> processMedium ~> mediumSink bcastRawBytes ~> processLow ~> lowSink
SinkShape(bcastInput.in) }})
Things we didn’t have time for
• Integrating with Actors
• Buffering and throttling streams
• Defining custom Graph shapes and stages
Thanks for listening!